Closed ZhikangNiu closed 8 months ago
I found this bug is caused by the laplace_smoothing We can scale the epsilon to 1e-3 and reduce learning rate. It can work well by using fp16 !
Hello, I have a doubt, why does the updated Nan in VQ affect the overall loss function?
Hello, I have a doubt, why does the updated Nan in VQ affect the overall loss function?
It will cause task failure
the amp training isn't stable. The loss will be Nan