Closed fancyerii closed 8 months ago
I have tested llama 2 13b and 70b on mmlu with 4.0 version. My 5-shots result of 70b is 0.632, it's not as good as the result of paper(0.68).
13b 0-shot
70b-chat 0-shot
70b-chat 5-shots,use parallel=True
CUDA_VISIBLE_DEVICES="1,2,3,4,5,6,7" lm-eval --model hf --model_args pretrained=/nas/lili/models_hf/70B-chat-hf,parallelize=True --tasks mmlu --device cuda --batch_size 1 --num_fewshot 5
The scores for LLaMA and LLaMA 2 are generally considered irreproducible because they have custom undisclosed prompts.
thank you.
I have tested llama 2 13b and 70b on mmlu with 4.0 version. My 5-shots result of 70b is 0.632, it's not as good as the result of paper(0.68).
13b 0-shot
70b-chat 0-shot
70b-chat 5-shots,use parallel=True
CUDA_VISIBLE_DEVICES="1,2,3,4,5,6,7" lm-eval --model hf --model_args pretrained=/nas/lili/models_hf/70B-chat-hf,parallelize=True --tasks mmlu --device cuda --batch_size 1 --num_fewshot 5