"nan" loss - Githubissues

ShihaoZhaoZSH / LaVi-Bridge

Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation

MIT License

287 stars 20 forks source link

Open tanshuai0219 opened 2 months ago

tanshuai0219 commented 2 months ago

when I test llama2+transformer, I always get a nan loss after few hundred steps. Could you give me some advices?

tanshuai0219 commented 2 months ago

when I test llama2+transformer, I always get a nan loss after few hundred steps. Could you give me some advices?

Is it critical to set the learning rate？

ShihaoZhaoZSH commented 2 months ago

Yes. We recommend using a relatively higher learning rate, such as 1e-4, for training the transformer models,

tanshuai0219 commented 2 months ago

Yes. We recommend using a relatively higher learning rate, such as 1e-4, for training the transformer models,

Thanks for your quick reply, and I just use 1e-4 as lr. I will have more attempts.