Shark-NLP / DiffuSeq

[ICLR'23] DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models
MIT License
707 stars 87 forks source link

'grad_norm' is NaN #82

Open LikeStarting opened 4 months ago

LikeStarting commented 4 months ago

Hi, When it comes to training step, the problem arises with 'grad_norm' of NaN. I used diffuSeq-v2 and used FP16 for GPU acceleration. Where is the problem and how can it be fixed.Thank you! 1715308130240 image

summmeer commented 3 months ago

It is suggested to use gradient monitoring and logging during training to identify the layer(s) or operation(s) causing the problem.

LikeStarting commented 3 months ago

Thank you a lot. I will try~

X-fxx commented 1 month ago

I'm having the same problem, have you solved it?