lm_eval --model local-chat-completions --tasks gpqa_main_cot_zeroshot --model_args model=Qwen/Qwen2-72B-Instruct,base_url=https://api.together.xyz/v1 --output_path ./gpqa/result/Qwen2 --use_cache ./gpqa/cache/Qwen2 --log_samples --limit 10 --gen_kwargs temperature=0.7,max_tokens=8192
Using this command, The Qwen2's result just end sooo weirdly like the image below
To be specific, only 256 tokens are generated. I'm wondering why this happens, is there any problem with max_tokens?
lm_eval --model local-chat-completions --tasks gpqa_main_cot_zeroshot --model_args model=Qwen/Qwen2-72B-Instruct,base_url=https://api.together.xyz/v1 --output_path ./gpqa/result/Qwen2 --use_cache ./gpqa/cache/Qwen2 --log_samples --limit 10 --gen_kwargs temperature=0.7,max_tokens=8192
Using this command, The Qwen2's result just end sooo weirdly like the image belowTo be specific, only 256 tokens are generated. I'm wondering why this happens, is there any problem with max_tokens?