Open varuniyer opened 1 month ago
It is just a warning and there is no significant effect on output.
The max_length
in generation_config.json
is used in Hugging Face transformers library inference and is not used in vllm inference.
If you'd like to resolve this warning, try setting max_position_embeddings
to 8192 in model's config.json
.
When attempting to use prompts longer than 2048 tokens, I get the following error (even with
-mx 3072
passed in):Input prompt (2083 tokens) is too long and exceeds limit of 2048
In the model's
config.json
on Huggingface,max_length
is set to 4096. Is it possible (and/or reasonable) to use longer prompts to get evaluation scores?