The loss is very large in the training process

zengchan commented 4 years ago

Hi,when I train the main_spin_location.py, the loss drop is very small. When training 10000 iters, the loss fluctuates around 800.

christianpayer commented 4 years ago

Hi, I never experienced problems that you are describing. To check for errors in the script, I again started the scripts and retrained them on my PC. It seems that the full training as well as cross validation work fine. I attached the train_csv.txt file for cross validation 0 that goes until iteration 10,000. This is how the training loss should approximately behave.

Did you make changes to the code? You could try to change training parameters (e.g. reduce learning rate) to check if the network converges then. Also, check if there is some problem with data loading/preprocessing. For this, set self.save_debug_images = True in MainLoop and look at the images that are then being generated inside the folder debug_train.

zengchan commented 4 years ago

Hi, I never experienced problems that you are describing. To check for errors in the script, I again started the scripts and retrained them on my PC. It seems that the full training as well as cross validation work fine. I attached the train_csv.txt file for cross validation 0 that goes until iteration 10,000. This is how the training loss should approximately behave.

Did you make changes to the code? You could try to change training parameters (e.g. reduce learning rate) to check if the network converges then. Also, check if there is some problem with data loading/preprocessing. For this, set self.save_debug_images = True in MainLoop and look at the images that are then being generated inside the folder debug_train.

Thank you very much for your reply！Maybe I changed the learning rate. But when I was training vertibrae location with your learning rate, I also encountered the problem of too much loss in the range of 800 to 1400. train_vertebrae_loss

christianpayer commented 4 years ago

No problem. I hope, I can help you. The loss in vertebrae localization is different as compared to the spine localization, because the number of outputs is different, the output normalization is different, the number of output channels is different, as well as the vertebrae localization also minimizes the sigmas of the heatmaps. So you cannot directly compare the losses among the individual tasks. Also, it is hard to say in general, when a loss is too high. For example, in the vertebrae localization you can see that especially in the beginning loss_sigma is reducing, while the loss of the heatmap outputs is increasing. Still, this is to be expected and the loss is still converging to a reasonable minimum. So I would not judge the network training on the loss function, but on the predicted outputs. Use the cross validations for that, and look at the evaluation scores (e.g. PE) and network predictions. I attached a train.csv file of the vertebrae localization for the first cross validation until iteration 20,000 for you to check for differences. If the losses of my run and your run behave completely different, again check for changes in the code.

zengchan commented 4 years ago

Thank you very much for your prompt reply. In addition, we predict that there will be one more category than the label in the result. Does this need post-processing? How do you deal with it? result-label And our results will be calculated by your index calculation code a little higher than our own code. calculate_code

christianpayer commented 4 years ago

No problem, I'm glad if I could help. Regarding an additional label, you would need to adapt the code, but probably not much changes will be necessary. You need to change the number of network outputs and the number of landmarks that the dataset is generating, but I guess you figured that already out. Regarding postprocessing, it depends what landmark you additionally added. The postprocessing function SpinePostprocessing assumes that the channels of the predicted heatmaps correspond to vertebrae from top to bottom. If your additional vertebra is in between some other vertebrae, then you would need to adapt the SpinePostprocessing code. You could also skip postprocessing completely and just take the maximum response of the heatmaps (i.e. take the first entry of every entry of the local_maxima_landmarks list).

christianpayer / MedicalDataAugmentationTool-VerSe

The loss is very large in the training process #3