Weighting the loss results in unstable training and NaN loss values.

damiankucharski commented 2 years ago

Hello, I am using nnUNet in liver tumor segmentation project. My ground truth comes with the confidence of the annotation for each class. I tried to incorporate loss weighting such that misclassification of voxels associated with high confidence impacts the loss more than the ones with low confidence. I am using DC_and_CE loss with MultipleOutputLoss2 for multiclass support. Only DICE is multiplied by the respective weights. After experimenting a little with different multipliers associated with specific confidence values I have noticed that when the weights are very high, at some point the loss becomes NaN and the network stops training. While high values for the weights effectively remove CE part from the loss, I am interested in why the multiplication of the loss by a high, but still reasonable value produces NaNs at some point. Below I attach the network training graph, you can see that quickly both training and validation losses are NaNs and the evaluation metric falls to zero. In this case the DICE was multiplied by 100 for every class and training example, so it is basically weighting whole DC part of the loss by 100. Do you have any idea why this may happen?

Training

FabianIsensee commented 2 years ago

This should be equivalent to choosing a much higher learnign rate. At some point the learning rate becomes too high and the model converges. I would recommend setting the weights such that the average weight the model sees (across all samples) is 1

GregorKoehler commented 1 year ago

Hi @damiankucharski, Were you able to resolve this issue back then? If you still encounter problems, please let us know! Otherwise, we'll close this issue in roughly 2 weeks.

MIC-DKFZ / nnUNet

Weighting the loss results in unstable training and NaN loss values. #1118