Closed TuuSiwei closed 6 months ago
when I run ./stage_1_2_full_gemma_v2b_336_hr_768.sh finetuning,I encounter 0 loss and nan grad,that’s why?
Hi @tsw123678 , may I ask you how did this problem has solved?
yes,i find it caused by the high version of deepspeed
Thanks a lot! For someone's information, recommended versions can be found in another issue
when I run ./stage_1_2_full_gemma_v2b_336_hr_768.sh finetuning,I encounter 0 loss and nan grad,that’s why?