ShihaoZhaoZSH / LaVi-Bridge

Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
MIT License
287 stars 20 forks source link

"nan" loss #13

Open tanshuai0219 opened 2 months ago

tanshuai0219 commented 2 months ago

when I test llama2+transformer, I always get a nan loss after few hundred steps. Could you give me some advices?

tanshuai0219 commented 2 months ago

when I test llama2+transformer, I always get a nan loss after few hundred steps. Could you give me some advices?

Is it critical to set the learning rate?

ShihaoZhaoZSH commented 2 months ago

Yes. We recommend using a relatively higher learning rate, such as 1e-4, for training the transformer models,

tanshuai0219 commented 2 months ago

Yes. We recommend using a relatively higher learning rate, such as 1e-4, for training the transformer models,

Thanks for your quick reply, and I just use 1e-4 as lr. I will have more attempts.