Loss NaN when run Qwen1.5-7B-Chat

TrustedLLM / UnKE

11 stars 1 forks source link

Loss NaN when run Qwen1.5-7B-Chat #1

Closed Kahsolt closed 2 months ago

Kahsolt commented 2 months ago

What I do:

modify config.py to use Qwen1.5-7B-Chat
uncomment all print logs
run python unke.py following the tutorial

What I get:

optim-V phase seems ok
optim-K phase gets loss nan from the 2nd step 😈

What's wrong with this? Please help...

DJC-GO-SOLO commented 2 months ago

Thank you for your attention, I will reply to you during the day tomorrow.

DJC-GO-SOLO commented 2 months ago

Could you tell me which version of the transformers library you are using?

Kahsolt commented 2 months ago

Just returned from travel, I'm so sorry to reply a bit late. 🤧 I use Python 3.9.0, numpy 1.24.1, torch 2.1.0 and transformers 4.43.3, hope these can help!

Note: even if I switched to transformers 4.40.1 as the latest commit requires, NaN still occurs. I run the model in float16, might this be the cause?

Kahsolt commented 2 months ago

Well, I changed float16 to bfloat16, now it work perfect! Thanks for your help, now it can be closed :)