Open ghtwht opened 1 year ago
Its easy to raise 'nan' error when training the translation model with 'transformer_base', have you ever encounter with this problem and how did you deal with it?
Counld you try with --dtype float32 to disable mix-precision training?
--dtype float32
dtype is default to be float32
Its easy to raise 'nan' error when training the translation model with 'transformer_base', have you ever encounter with this problem and how did you deal with it?