dvlab-research / MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Apache License 2.0
3.22k stars 280 forks source link

loss 0 and grad nan #123

Closed TuuSiwei closed 6 months ago

TuuSiwei commented 6 months ago

when I run ./stage_1_2_full_gemma_v2b_336_hr_768.sh finetuning,I encounter 0 loss and nan grad,that’s why? image

Hayoung93 commented 5 months ago

Hi @tsw123678 , may I ask you how did this problem has solved?

TuuSiwei commented 5 months ago

Hi @tsw123678 , may I ask you how did this problem has solved?

yes,i find it caused by the high version of deepspeed

Hayoung93 commented 5 months ago

Thanks a lot! For someone's information, recommended versions can be found in another issue