OpenGVLab / VideoMamba

VideoMamba: State Space Model for Efficient Video Understanding
https://arxiv.org/abs/2403.06977
Apache License 2.0
660 stars 47 forks source link

Grad is NaN #28

Open isyangshu opened 2 months ago

isyangshu commented 2 months ago

Hello,

Thank you for your great work.

While attempting to fine-tune the videomamba on my own dataset, I encountered some issues. When I chose a batch size of 4 and used two H800 GPUs for training, everything worked well. However, when I increased the batch size to 8/16/32, the loss looks great, but the gradients of the parameters became NaN, preventing the model from training.

Could you please provide some guidance on how to resolve this gradient issue when using larger batch sizes?

Andy1621 commented 2 months ago

Hi! Can you use a smaller learning rate? Besides, maybe you can try to adapt the optimizer hyperparameters, such as β1, β2, ϵ = 0.9, 0.98, 1e-6.

sailor-z commented 2 months ago

Hi, I'm facing the same issue. I've tried different learning rates, but I still get NaN occasionally.

softsweets commented 1 month ago

Hi, I'm facing the same issue.Have you solved it

niuniujiao commented 1 month ago

Hi, I'm facing the same issue.Have you solved it