EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
6.94k stars 1.85k forks source link

The response is too short to extract answer on GPQA. What should I set to extend it? #2081

Open URRealHero opened 4 months ago

URRealHero commented 4 months ago

lm_eval --model local-chat-completions --tasks gpqa_main_cot_zeroshot --model_args model=Qwen/Qwen2-72B-Instruct,base_url=https://api.together.xyz/v1 --output_path ./gpqa/result/Qwen2 --use_cache ./gpqa/cache/Qwen2 --log_samples --limit 10 --gen_kwargs temperature=0.7,max_tokens=8192 Using this command, The Qwen2's result just end sooo weirdly like the image below image

To be specific, only 256 tokens are generated. I'm wondering why this happens, is there any problem with max_tokens?

KADCA21 commented 1 month ago

same issue here, maybe try add max_gen_toks: 2048 in _gpqa_cot_zeroshot_yaml: image