Audio-WestlakeU / NBSS

The official repo of NBC & SpatialNet for multichannel speech separation, denoising, and dereverberation
MIT License
203 stars 24 forks source link

Loss Nan Value #11

Open PriyankaPaud opened 1 year ago

PriyankaPaud commented 1 year ago

I am getting the value for loss as Nan

And cuda error while training

quancs commented 1 year ago

I didn't encounter this problem. Did you use 16 bit precision training?

xxchauncey commented 11 months ago

如果是fp16训练遇到nan是正常的吗?

quancs commented 11 months ago

正常的,可以用之前epoch的checkpoint使用32精度继续训练

quancs commented 11 months ago

@xxchauncey 可以用bf16,性能比fp16差点,但不怎么遇到nan

xxchauncey commented 11 months ago

@xxchauncey 可以用bf16,性能比fp16差点,但不怎么遇到nan

感谢,我是最近才接触audio separation这一块的,前阵子切换了好几种backbone都会在训练中期出现nan,在v100卡上解决方案只能是切回32精度继续训练。以前不管是asr还是小型nlp模型都没有碰到过这样的情况,所以比较好奇。