bigcode-project / starcoder

Home of StarCoder: fine-tuning & inference!
Apache License 2.0
7.23k stars 512 forks source link

Deprecated warning during inference with starcoder fp16 #133

Open code2graph opened 11 months ago

code2graph commented 11 months ago

I installed all the dependencies by following the instructions from the repo.

Following that, I am running the following code:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "bigcode/starcoder"
device = "cuda"  # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(checkpoint)

# to save memory consider using fp16 or bf16 by specifying torch_dtype=torch.float16 for example
model = AutoModelForCausalLM.from_pretrained(checkpoint,
                                             device_map="auto",
                                             torch_dtype=torch.float16)

inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
outputs = model.generate(inputs)

# clean_up_tokenization_spaces=False prevents a tokenizer edge case which can result in spaces being removed around punctuation
print(tokenizer.decode(outputs[0], clean_up_tokenization_spaces=False))

But I am getting deprecated warnings:

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:45<00:00,  6.46s/it]
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
/home/xxxx/.local/lib/python3.10/site-packages/transformers/generation/utils.py:1313: UserWarning: Using `max_length`'s default (20) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.
  warnings.warn(
def print_hello_world():
    print("Hello World")

def print_hello_
ArmelRandy commented 11 months ago

Hi, the warning is there to suggest you to use max_new_tokens, instead the default max_length. max_length represents the length (in terms of tokens) of the prompt (the input sequence) + the number of tokens generated during the inference. max_new_tokens just represents the number of tokens generated during inference. These 2 arguments are very similar to each other, and they are not both needed since you can get one from the other. By default, max_length is used and it is set to 20 but this argument will be deprecated. max_new_tokens' overridesmax_length` when it is set.