Closed aliceinland closed 2 months ago
Hi,
I think one possible reason is you're using float16 for training. Sometimes the numbers will exceed the value range of float16, and become NaN. Using float32 should solve it.
Hello, thank you for the reply! I am using float32 high precision and I still get NaN values.
Here the line of code that I added to the code, in addition to passing the value --precision 32
to the Parser:
torch.set_float32_matmul_precision('high')
What is the batch size you're using?
The batch size was 8. By changing the GPUs model, the issue has been solved! Thank you :)
Dear author,
I was trying to replicate your results. I downloaded your dataset and followed the instructions presented on the GitHub page. Anyway, before reaching the end of epoch 0, around step 39K, the loss goes to NaN. I did not modify or change any of the pre-defined parameters you have put in the code. Do you have any suggestions of what could cause the issue? The same issue, is not present when I train BA-TFD.