santacoder fp16 causes NaN on humaneval?

ywen666 commented 1 year ago

Just wondering if we need to use fp32 for evaluation of santacoder? I tried fp16 evaluation because I fine-tuned santacoder on the stack-dedup python dataset for 1000 steps with fp16 precision. But when I ran fp16 evaluation on humaneval, it leads to the following error (for both --model=bigcode/santacoder and --model=myfp16_finetuned_santacoder),

File "/home/ywen/miniconda3/lib/python3.9/site-packages/transformers/generation/utils.py", line 2583, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

The error went away if I use --precision=fp32, leading to 37.19% pass@100 on humaneval which is kinda close to the number reported in the paper. This is the command I used to run fp16 evaluation on humaneval.

accelerate launch main.py \
    --model bigcode/santacoder \
    --max_length_generation 368 \
    --tasks humaneval \
    --temperature 0.4 \
    --n_samples 100 \
    --batch_size 20 \
    --allow_code_execution \
    --trust_remote_code \
    --use_auth_token \
    --generation_only \
    --precision fp16 \
    --save_generations

loubnabnl commented 1 year ago

Yes SantaCoder had some issues with fp16so it's better to either use bf16 or fp32 with it

xxrjun commented 1 month ago

Yes SantaCoder had some issues with fp16so it's better to either use bf16 or fp32 with it

I encountered the same error while evaluating Humaneval on meta-llama/Llama-2-7b-chat-hf using fp16 precision. However, I was able to complete the evaluation for the CodeLlama series model and meta-llama/Llama-2-7b-hf without issues. Could you please explain the reasons behind it and the impact of changing the precision setting from fp16 to fp32 on the pass@k score? Thank you very much for your assistance!

bigcode-project / bigcode-evaluation-harness

santacoder fp16 causes NaN on humaneval? #83