Closed chuxiang93 closed 2 years ago
Hi @mjq93
I am sorry you are facing this difficulty. We never ran into a NaN while training the UNETR. It is most likely caused by DiceCELoss
as the values of denominator become very small.
One quick fix it to set squared_pred = False
in DiceCELoss. This prevents the value of predictions, which are between 0 and 1, to further become smaller -- although we observed slightly better performance by squaring the predictions.
Another fix would be to pass a larger value such as smooth_dr = 1e-5
to prevent division by zero. I believe first fix will work out of the box and is more preferred.
Thanks
@mjq93 I have met the same issue that the loss became Nan when training. Both the two fixs are invalid, how do you solve them in the end?
I have met the same issue that the loss became Nan when training. Both the two fixs are invalid, how do you solve them in the end?
Thank you, I have solved it
Describe the bug Nan popped up while I was training the UNETR network. WARNING:root:NaN or Inf found in input tensor.
To Reproduce Steps to reproduce the behavior:
Screenshots
Environment (please complete the following information):