Deprecated warning during inference with starcoder fp16

I installed all the dependencies by following the instructions from the repo.

Following that, I am running the following code:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "bigcode/starcoder"
device = "cuda"  # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(checkpoint)

# to save memory consider using fp16 or bf16 by specifying torch_dtype=torch.float16 for example
model = AutoModelForCausalLM.from_pretrained(checkpoint,
                                             device_map="auto",
                                             torch_dtype=torch.float16)

inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
outputs = model.generate(inputs)

# clean_up_tokenization_spaces=False prevents a tokenizer edge case which can result in spaces being removed around punctuation
print(tokenizer.decode(outputs[0], clean_up_tokenization_spaces=False))

But I am getting deprecated warnings:

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:45<00:00,  6.46s/it]
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
/home/xxxx/.local/lib/python3.10/site-packages/transformers/generation/utils.py:1313: UserWarning: Using `max_length`'s default (20) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.
  warnings.warn(
def print_hello_world():
    print("Hello World")

def print_hello_

bigcode-project / starcoder

Deprecated warning during inference with starcoder fp16 #133