Open tanshuai0219 opened 2 months ago
when I test llama2+transformer, I always get a nan loss after few hundred steps. Could you give me some advices?
Is it critical to set the learning rate?
Yes. We recommend using a relatively higher learning rate, such as 1e-4, for training the transformer models,
Yes. We recommend using a relatively higher learning rate, such as 1e-4, for training the transformer models,
Thanks for your quick reply, and I just use 1e-4 as lr. I will have more attempts.
when I test llama2+transformer, I always get a nan loss after few hundred steps. Could you give me some advices?