OpenGVLab / VideoMamba

VideoMamba: State Space Model for Efficient Video Understanding
https://arxiv.org/abs/2403.06977
Apache License 2.0
660 stars 47 forks source link

Multi-GPU training #23

Open sailor-z opened 2 months ago

sailor-z commented 2 months ago

Hi,

Thanks for releasing the code! I am working on retraining the model using pytorch lightning. It works perfectly when I use a single A100 GPU, but I always get NaN loss when using multiple GPUs. The strategy I am using is DDP and it works well for other methods. What could be the reason?

Andy1621 commented 2 months ago

Hi! Please try to use bfloat16 or small learning rate.

FengRui1998 commented 2 months ago

I also found the same problem, when I set up multi-card running, it is always inefficient.

sailor-z commented 2 months ago

Adjusting the learning rate or using bfloat16 doesn't work for me.