alexzhou907 / DDBM

152 stars 15 forks source link

Training Problem #7

Open sfwang-auto opened 4 months ago

sfwang-auto commented 4 months ago

I found that "param_norm" grows very quickly when training in "ve" mode. It is easy to overflow when using mixed precision and small batchsize. I wonder if authors has encountered this question and how to solve it.

alexzhou907 commented 4 months ago

I have not encountered this. But this can be related to time sampling that's too close to boundaries, e.g. 0 or T, as there are singularities at these points that can cause numerical errors.