bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
698 stars 180 forks source link

Value error regarding to "--max_length_generation" #243

Closed aladinggit closed 3 weeks ago

aladinggit commented 3 weeks ago

Hi, I am running MBPP on llama-2 model. However, whatever "--max_length_generation" I set, the benchmark will fail.

My command to execute: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch main.py --model meta-llama/Llama-2-7b-hf --tasks mbpp --precision bf16 --max_length_generation 1024 --allow_code_execution

The output failure message: ValueError: Input length of input_ids is 1024, but max_length is set to 1024. This can lead to unexpected behavior. You should consider increasing max_length or, better yet, setting max_new_tokens.ValueError: Input length of input_ids is 1024, but max_length is set to 1024. This can lead to unexpected behavior. You should consider increasing ma x_length or, better yet, setting max_new_tokens.

It seems to me the length of input_ids is somehow increasing with the --max_length_generation? With that i am not able to conduct the benchmark for MBPP. Thanks!

loubnabnl commented 3 weeks ago

It seems some of the prompts are long (they take all tokens of max_generation_length and then you have no room left for generation). Can you try setting it to a larger value such as 2048? (some tokenizers generate more tokens than others)