Closed lqniunjunlper closed 7 months ago
Mamba sometimes has training problem with nan loss when using fp16 or amp.
So the curiosity here is how to keep the traing process stable while using fp16 for effiency.
Thanks.
Hi, from my experience. It's quite stable even in single-gpu(you need open the rescaling function of accelerator to avoid the NAN loss issue.)
Mamba sometimes has training problem with nan loss when using fp16 or amp.
So the curiosity here is how to keep the traing process stable while using fp16 for effiency.
Thanks.