Open isyangshu opened 2 months ago
Hi! Can you use a smaller learning rate? Besides, maybe you can try to adapt the optimizer hyperparameters, such as β1, β2, ϵ = 0.9, 0.98, 1e-6
.
Hi, I'm facing the same issue. I've tried different learning rates, but I still get NaN occasionally.
Hi, I'm facing the same issue.Have you solved it
Hi, I'm facing the same issue.Have you solved it
Hello,
Thank you for your great work.
While attempting to fine-tune the videomamba on my own dataset, I encountered some issues. When I chose a batch size of 4 and used two H800 GPUs for training, everything worked well. However, when I increased the batch size to 8/16/32, the loss looks great, but the gradients of the parameters became NaN, preventing the model from training.
Could you please provide some guidance on how to resolve this gradient issue when using larger batch sizes?