GeroVanMi / algorithmic-quartet-mlops

A showcase Machine Learning Operations (MLOps) Project.
0 stars 1 forks source link

Deal with NaN loss issue #28

Closed vollOlga closed 4 months ago

GeroVanMi commented 4 months ago

Note: The current training code causes the model to quickly produce NaN loss. I assume that this could be due to exploding gradients (lack of BatchNormalization?)

GeroVanMi commented 4 months ago

Nevermind, this is due to an issue with the GTX 1660 Graphics Card series not being able to handle 16 bit floating point precision: https://github.com/pytorch/pytorch/issues/58123

Loss calculation works as intended now when using 32 bit fp.