MIC-DKFZ / nnUNet

Apache License 2.0
5.77k stars 1.73k forks source link

Does this code of nnUNet works well? #16

Closed zxyyxzz closed 4 years ago

zxyyxzz commented 5 years ago

Hi FabianIsensee: An incredible and outstanding work!~~~,you are the best researcher of segmentation area,did your still remember me?Xinyu is me. Oh my god, when i read your paper of nnUNet tonight,it makes me feel exciting!~. It will be the strong baseline of unet Does it works now ?

Best Xinyu

FabianIsensee commented 5 years ago

oh HI, it's you =) Happy to hear you enjoy our work! I hope this repository is helpful to you! Yes it works ;-) All the results from the paper were made with this code. Best, Fabian

zxyyxzz commented 5 years ago

oh HI, it's you =) Happy to hear you enjoy our work! I hope this repository is helpful to you! Yes it works ;-) All the results from the paper were made with this code. Best, Fabian

Hi Fabian: 1.And i find a wierd code in your implement,it is in network_trainer.py of network training: if self.val_eval_criterion_MA > self.best_val_eval_criterion_MA: self.best_val_eval_criterion_MA = self.val_eval_criterion_MA self.print_to_log_file("saving best epoch checkpoint...") self.save_checkpoint(join(self.output_folder, "model_best.model")) But the code should like this :self.val_eval_criterion_MA < self.best_val_eval_criterion_MA Because best_val_eval_criterion_MA(val_loss) should be the smallest, right?

2.when you do learning rate step, you choose the train_loss_MA to do it, why don't you use val_loss(self.val_eval_criterion_MA)? Maybe the val_loss would be better than train_loss.

3.And i find out that the hyperparameters is different from the parameter you used in Brats2018,for example: patch_size is bigger than brats2018. Did you optimize the hyperparameters and preprocessing steps relative to Brats2018?So these hyperparameters and preprocessing steps is better than brats2018? Hope you can reply me :) Thanks!~~~ Best Xinyu

zxyyxzz commented 5 years ago

@FabianIsensee

FabianIsensee commented 5 years ago

1.And i find a wierd code in your implement,it is in network_trainer.py of network training: if self.val_eval_criterion_MA > self.best_val_eval_criterion_MA: self.best_val_eval_criterion_MA = self.val_eval_criterion_MA self.print_to_log_file("saving best epoch checkpoint...") self.save_checkpoint(join(self.output_folder, "model_best.model")) But the code should like this :self.val_eval_criterion_MA < self.best_val_eval_criterion_MA Because best_val_eval_criterion_MA(val_loss) should be the smallest, right?

eval metric is global average foreground dice, so higher is better. That is the green line in the plots

2.when you do learning rate step, you choose the train_loss_MA to do it, why don't you use val_loss(self.val_eval_criterion_MA)? Maybe the val_loss would be better than train_loss.

Because if I did I would be overfitting in the cross-validation. Training loss works fine ;-)

3.And i find out that the hyperparameters is different from the parameter you used in Brats2018,for example: patch_size is bigger than brats2018.

This code was not used for BraTS2018. So yes, it is different :-)

Best, Fabian

zxyyxzz commented 5 years ago

Hi Fabian:

Yeah, i konw what your meaning is, but why not use the method of val loss which is the same as train loss'method(sum of the cross-entropy loss and the dice loss are used as loss function) to do learning rate step like Brats2018 that you have did.Maybe it will be more robust.

Best Xinyu

FabianIsensee commented 5 years ago

Hi, I am not quite sure what you mean

1.Yeah, i konw what your meaning is, but why not use the method of val loss which is the same as train loss'method(sum of the cross-entropy loss and the dice loss are used as loss function) to do learning rate step like Brats2018 that you have did.Maybe it will be more robust.

I am using the training loss for determining when to drop the learning rate. I am not using the validation loss because that would be overfitting. While doing cross-validation I need to completely ignore the validation set.

2.Just as you said, eval metric is global average foreground dice, but in your code, function named self.run_online_evaluation(output, target) in the run_iteration() of network_trainer.py have not be implemented, so the self.all_val_eval_metrics would not exist, which means len(self.all_val_eval_metrics)==0.

If that does not exist then nnUNet should revert back to using the validation loss. Then of course lower is better. Let me double check - I hope there is no bug there.

Best, Fabian

FabianIsensee commented 5 years ago

You are right there seems to be a bug here:

            if len(self.all_val_eval_metrics) == 0:
                self.val_eval_criterion_MA = self.val_eval_criterion_alpha * self.val_eval_criterion_MA + (
                            1 - self.val_eval_criterion_alpha) * \
                                             self.all_val_losses[-1]

There should be a - in there. I will fix this, thanks for pointing it out!

zxyyxzz commented 5 years ago

Hi, I am not quite sure what you mean

1.Yeah, i konw what your meaning is, but why not use the method of val loss which is the same as train loss'method(sum of the cross-entropy loss and the dice loss are used as loss function) to do learning rate step like Brats2018 that you have did.Maybe it will be more robust.

I am using the training loss for determining when to drop the learning rate. I am not using the validation loss because that would be overfitting. While doing cross-validation I need to completely ignore the validation set.

2.Just as you said, eval metric is global average foreground dice, but in your code, function named self.run_online_evaluation(output, target) in the run_iteration() of network_trainer.py have not be implemented, so the self.all_val_eval_metrics would not exist, which means len(self.all_val_eval_metrics)==0.

If that does not exist then nnUNet should revert back to using the validation loss. Then of course lower is better. Let me double check - I hope there is no bug there.

Best, Fabian

Hi Fabian: 1.My mean is that you have used val loss to drop learning rate in Brats2018 ,\ as your ariticle(MICCAI2018,Brats2018) said:The training is terminated early if the exponential moving average of the validation loss (α = 0.95) has not improved within the last 60 epochs.AND which is reduced by factor 5 whenever the above mentioned moving average of the validation loss has not improved in the last 30 epochs so why not keep consistent with brats2018, is there any reasons? If use val loss to drop learning rate would be overfiting in the code, use val loss in brats2018 will also be overfiting, right?

  1. Another question is about metadata in your batchgenerators-master, so the metadata is just used for test, it is not used for train and train loss, right?

Hope you can understand what my mean, because my English is poor. :) Best Xinyu

FabianIsensee commented 5 years ago

Hi there,

1.My mean is that you have used val loss to drop learning rate in Brats2018 , as your ariticle(MICCAI2018,Brats2018) said:The training is terminated early if the exponential moving average of the validation loss (α = 0.95) has not improved within the last 60 epochs.AND which is reduced by factor 5 whenever the above mentioned moving average of the validation loss has not improved in the last 30 epochs so why not keep consistent with brats2018, is there any reasons? If use val loss to drop learning rate would be overfiting in the code, use val loss in brats2018 will also be overfiting, right?

You are right I did it this way in BraTS2018. That was a mistake (in fact my early implementation of nnU-Net had that as well). To be honest, this really does not have a measurable effect anyways. It is just more clean like this.

Another question is about metadata in your batchgenerators-master, so the metadata is just used for test, it is not used for train and train loss, right?

Not quite sure what you mean. The pkl files that are generated alongside the npz files? These metadata are needed to restore the geometry of the results so that when you load both the image and the predicted segmentation in a medical image viewer (such as slicer or mitk) the segmentation will overlay properly with the image. These metadata are not used for training

Your English is good - don't worry about it ;-) Best, Fabian

zxyyxzz commented 5 years ago

Hi there,

1.My mean is that you have used val loss to drop learning rate in Brats2018 , as your ariticle(MICCAI2018,Brats2018) said:The training is terminated early if the exponential moving average of the validation loss (α = 0.95) has not improved within the last 60 epochs.AND which is reduced by factor 5 whenever the above mentioned moving average of the validation loss has not improved in the last 30 epochs so why not keep consistent with brats2018, is there any reasons? If use val loss to drop learning rate would be overfiting in the code, use val loss in brats2018 will also be overfiting, right?

You are right I did it this way in BraTS2018. That was a mistake (in fact my early implementation of nnU-Net had that as well). To be honest, this really does not have a measurable effect anyways. It is just more clean like this.

Another question is about metadata in your batchgenerators-master, so the metadata is just used for test, it is not used for train and train loss, right?

Not quite sure what you mean. The pkl files that are generated alongside the npz files? These metadata are needed to restore the geometry of the results so that when you load both the image and the predicted segmentation in a medical image viewer (such as slicer or mitk) the segmentation will overlay properly with the image. These metadata are not used for training

Your English is good - don't worry about it ;-) Best, Fabian

1.So use val loss to drop learning rate is a wrong method , right? But you use the method still achieve 2nd performance of brats2018, why is that?Would not be overfitting on brats2018?

2.So the metadata is just used for the stage which is online evaluation? right?

Thans for your encourage about my english :)

Best Xinyu

FabianIsensee commented 5 years ago

Like I said I think it does not matter in most cases. It is just more clean to to this via train_loss.

The metadata is used only for saving the predicted segmentations back to nifti which is done at the very end. See nnUNetTrainer.validate()

Best, Fabian

zxyyxzz commented 5 years ago

Like I said I think it does not matter in most cases. It is just more clean to to this via train_loss.

The metadata is used only for saving the predicted segmentations back to nifti which is done at the very end. See nnUNetTrainer.validate()

Best, Fabian

Hi Fabian: 1.Yes, maybe i know what your meaning is, ''more clean'' means that it is better than val loss in most cases? right? 2.How long time dose you take to train a model of Brats2018?(500 epochs), It seems so slowly in my computer....... :-) Best Xinyu

FabianIsensee commented 5 years ago

Hi,

1.Yes, maybe i know what your meaning is, ''more clean'' means that it is better than val loss in most cases? right?

From a methodological point of view it is ALWAYS better to use train_loss. If you look at metrics it does not matter

2.How long time dose you take to train a model of Brats2018?(500 epochs), It seems so slowly in my computer.......

That takes ~3 days I think. Good results take time. If it takes longer than 5 days on your PC then you should investigate. What GPU are you using, what CPU? How many CPU cores per GPU? BraTS is quite CPU intensive because of the 4 modalities and 4 labels

Best, Fabian

zxyyxzz commented 5 years ago

Hi,

1.Yes, maybe i know what your meaning is, ''more clean'' means that it is better than val loss in most cases? right?

From a methodological point of view it is ALWAYS better to use train_loss. If you look at metrics it does not matter

2.How long time dose you take to train a model of Brats2018?(500 epochs), It seems so slowly in my computer.......

That takes ~3 days I think. Good results take time. If it takes longer than 5 days on your PC then you should investigate. What GPU are you using, what CPU? How many CPU cores per GPU? BraTS is quite CPU intensive because of the 4 modalities and 4 labels

Best, Fabian

Hi: You said it takes 3 days, 5 fold cross-validation or just one model ?

Best, Xinyu

FabianIsensee commented 5 years ago

One 3d_fullres model on one GPU takes ~3 days. I always run cross-validation in parallel (5 GPUs, one for each fold). Best, Fabian

zxyyxzz commented 5 years ago

One 3d_fullres model on one GPU takes ~3 days. I always run cross-validation in parallel (5 GPUs, one for each fold). Best, Fabian

Thanks for you reply, !~~~:) Best, Xinyu

zxyyxzz commented 5 years ago

One 3d_fullres model on one GPU takes ~3 days. I always run cross-validation in parallel (5 GPUs, one for each fold). Best, Fabian

Hi: It is really weird, the val loss is negative as following, and the train loss is negative too: epoch: 5

  1. 2019-06-15 07:09:15.585560: train loss : 0.1897
  2. 2019-06-15 07:23:44.189072: val loss (train=False): -0.0493
  3. 2019-06-15 07:23:44.392078: This epoch took 5353.590916 s
  4. 2019-06-15 07:23:44.454762: Val glob dc per class: [0.0, 0.6354949015375244, 0.7820830626880708]
  5. 2019-06-15 07:23:51.112532: lr is now (scheduler) 0.0003
  6. 2019-06-15 07:23:51.144324: current best_val_eval_criterion_MA is 0.02240
  7. 2019-06-15 07:23:51.176101: current val_eval_criterion_MA is 0.0615
  8. 2019-06-15 07:23:51.177163: saving best epoch checkpoint...
  9. 2019-06-15 07:23:51.225156: saving checkpoint...
  10. 2019-06-15 07:23:53.605782: done, saving took 2.43 seconds
  11. 2019-06-15 07:23:53.693536: New best epoch (train loss MA): 0.7278
  12. 2019-06-15 07:23:53.694889: Patience: 0/60
  13. 2019-06-15 07:23:53.695926:

Best Xinyu

zxyyxzz commented 5 years ago

Is this reason that your softdice is negative? So softdice+cross entropy is probably negative when the loss dop, right? @FabianIsensee

FabianIsensee commented 5 years ago

Hi, yes the loss gets negative. Dice loss ranges from 0 to -1 and CE from INF to 0. The best loss is -1. Best, Fabian

zxyyxzz commented 5 years ago

Hi, yes the loss gets negative. Dice loss ranges from 0 to -1 and CE from INF to 0. The best loss is -1. Best, Fabian

Hi Fabian: Thanks for your reply :-) your ability of coding is excellent, and the code is complicated to understand for me, haha, how do you improve your coding ability,i hope i can achive your ability of coding in future.:-)

  1. so i read your validate function and see the _internal_predict_3D_3Dconv_tiled function of neural_network.py, there are some confusing about this function: (1).what's the tile_in_z mean? (2).what's the result_numsamples mean? (3).what's the add_torch mean? (4).why do you use gaussian for testing? (5).what's the xsteps ysteps zsteps mean? (6).what's the regions_class_order mean?

2.And the other quetion about test is that you test image by patch, so you think test full image would not get a good perfomance,right?

I love your code, so there are several questions, hope you can help me, i hope i can improve my coding ability by reading your method , because it really helps me, i am new for computer science.:-)

Thanks again !~~)

Best Xinyu

FabianIsensee commented 5 years ago

Hi Xinyu, I have never learned proper software development. If you want to improve, my code is probably not where you should start. For improving I recommend reading through pytorch tutorials or even the pytorch code itself - that one is really well done. About your questions, replying to each of these is going to be really long and I would like to postpone that to a later date. You have come quite deep into the code and these things are not properly documented yet - so it's unsurprising that you have troubles understanding what it going on. Please give me some time to fill in the missing documentation (that will take a while though as I am currently busy with other things). Best, Fabian

zxyyxzz commented 5 years ago

Hi Xinyu, I have never learned proper software development. If you want to improve, my code is probably not where you should start. For improving I recommend reading through pytorch tutorials or even the pytorch code itself - that one is really well done. About your questions, replying to each of these is going to be really long and I would like to postpone that to a later date. You have come quite deep into the code and these things are not properly documented yet - so it's unsurprising that you have troubles understanding what it going on. Please give me some time to fill in the missing documentation (that will take a while though as I am currently busy with other things). Best, Fabian

Hi Fabian: Sorry about this, maybe i should ask some simple questions, because you are busy too.

thanks for your reply:)

Best Xinyu

FabianIsensee commented 5 years ago

I updated the documentation in a recent commit. You should find some information there that can point you in the right direction ;-) Best, Fabian

FabianIsensee commented 5 years ago

Can I close this issue? Best, Fabian

zxyyxzz commented 5 years ago

Can I close this issue? Best, Fabian

Hi Fabian: Sorry to reply you late, i have read your the newest code today, but i still have some questions,i trained the nnunet on Brats2018: 1. (1).why do you use the slide window to test? beacause there would spend more time for tesing. (2).why be there stitching artifacts?,this really make me confusing. 2. I use the 2017/2018 examle of batchgenerators, and i have trained a nnunet, then i test the data, if i set do_mirror = True ,the result wolud be close to 0 of dice ,but if i set do_mirror = False, everthing is ok , why is that? Really confusing.

Hope you can help me, and have a nice day:-) 3.when i trained for 500 epochs, there maybe overfitting ,dice of some cases of validation of brats2018 is close to 0 or equal 0, but the dice of valdation on training is well, why is that?

Best Xinyu

zxyyxzz commented 5 years ago

Can I close this issue? Best, Fabian

@FabianIsensee

FabianIsensee commented 5 years ago

Hi, no need to ping me. I avoid work as much as I can on weekends so my responses will always be a little delayed. It's kind of a self-protection mechanism ;-)

(1).why do you use the slide window to test? beacause there would spend more time for tesing.

Try fitting 500x500x500 Liver data fully convolutionally into your GPU. That's why :-)

(2).why be there stitching artifacts?,this really make me confusing.

Set use_gaussian=False and step=1 to see what I mean by that

I use the 2017/2018 examle of batchgenerators, and i have trained a nnunet, then i test the data, if i set do_mirror = True ,the result wolud be close to 0 of dice ,but if i set do_mirror = False, everthing is ok , why is that? Really confusing.

Do you use mirroring in training as well? If not then this is the problem. Can you please provide all the parameters you use for validate()?

3.when i trained for 500 epochs, there maybe overfitting ,dice of some cases of validation of brats2018 is close to 0 or equal 0, but the dice of valdation on training is well, why is that?

Overfitting is normal. BraTS is sometimes inconsistent in how images are annotated. For the training data the network can learn these cases by heart so the dice is good, but it cannot know how the validation data is annotated. You will find that the dice score of 0 is often for the necrosis or enhancing tumor label

Best, Fabian

zxyyxzz commented 5 years ago

Hi, no need to ping me. I avoid work as much as I can on weekends so my responses will always be a little delayed. It's kind of a self-protection mechanism ;-)

(1).why do you use the slide window to test? beacause there would spend more time for tesing.

Try fitting 500x500x500 Liver data fully convolutionally into your GPU. That's why :-)

(2).why be there stitching artifacts?,this really make me confusing.

Set use_gaussian=False and step=1 to see what I mean by that

I use the 2017/2018 examle of batchgenerators, and i have trained a nnunet, then i test the data, if i set do_mirror = True ,the result wolud be close to 0 of dice ,but if i set do_mirror = False, everthing is ok , why is that? Really confusing.

Do you use mirroring in training as well? If not then this is the problem. Can you please provide all the parameters you use for validate()?

3.when i trained for 500 epochs, there maybe overfitting ,dice of some cases of validation of brats2018 is close to 0 or equal 0, but the dice of valdation on training is well, why is that?

Overfitting is normal. BraTS is sometimes inconsistent in how images are annotated. For the training data the network can learn these cases by heart so the dice is good, but it cannot know how the validation data is annotated. You will find that the dice score of 0 is often for the necrosis or enhancing tumor label

Best, Fabian

Hi Fabian: Thanks for you awnser, it is helpful for me,i use the same model and the same parameters for train and test , the only difference with nnunet is the dataloader, i use the 2017/2018 examle of batchgenerators to load data and preprocess the training data, i have did mirror, i used anything in the 2017/2018 example,but when i test, all dice (enhacing, necrosis, complete ) of some cases is equal to 0 or close to 0.

Best Xinyu

FabianIsensee commented 5 years ago

Hi, could be a normalization issue or just one of MANY other things. Please use nnU-Net in its entirety and not just some of it's components ;-) If you change something you may overlook things that are important. I really can't say from here what the problem may be. Best, Fabian

zxyyxzz commented 5 years ago

Hi, could be a normalization issue or just one of MANY other things. Please use nnU-Net in its entirety and not just some of it's components ;-) If you change something you may overlook things that are important. I really can't say from here what the problem may be. Best, Fabian

Hi Fabian: Thanks for your help me , it really helps me, so i would to find the reason why it doesn't work when use 2017 example. Thanks again !!!~~:)Have a nice day~

Best Xinyu

zxyyxzz commented 5 years ago

Hi, could be a normalization issue or just one of MANY other things. Please use nnU-Net in its entirety and not just some of it's components ;-) If you change something you may overlook things that are important. I really can't say from here what the problem may be. Best, Fabian

Oh the last question is that the perfomance of sliding window or full image, which one is better?

Best Xinyu

FabianIsensee commented 5 years ago

If you want to be sure to get the best results then you use sliding window. In general: the less you change (probably) the better :-) Best, Fabian

zxyyxzz commented 5 years ago

If you want to be sure to get the best results then you use sliding window. In general: the less you change (probably) the better :-) Best, Fabian

Thanks~~~~:-)

Best Xinyu

FabianIsensee commented 5 years ago

I am not saying you should not experiment with things and try to make it better of course. But before you go ahead and make changes, run everything as is first to have a proper baseline. Only then change things :-)

zxyyxzz commented 5 years ago

I am not saying you should not experiment with things and try to make it better of course. But before you go ahead and make changes, run everything as is first to have a proper baseline. Only then change things :-)

yes, you are right, i should run everything good at first, i just change first before, so i meet so many troubles , thanks for your advices, i will do it :)

Best Xinyu

zxyyxzz commented 5 years ago

I am not saying you should not experiment with things and try to make it better of course. But before you go ahead and make changes, run everything as is first to have a proper baseline. Only then change things :-)

Hi FabianIsensee : I meet new problem, my cpu has 48 core,if i run two or more code on my server, but the server is over- load, So i reduce the process of my code, bu i will became very slow.So any solutions to the problem?

Best Xinyu :)

FabianIsensee commented 5 years ago

Hi, please run the code with

OMP_NUM_THREADS=1 python run/run_training.py ...

Best, Fabian

zxyyxzz commented 5 years ago

OMP_NUM_THREADS=1

Hi: if set THREADS=1, the speed of the code would be very slow, isn't it? Best Xinyu

FabianIsensee commented 5 years ago

No. This just prevents numpy from messing things up. This will not reduce the amount of processes used for data augmentation. It will speed your training up. Best, Fabian

zxyyxzz commented 5 years ago

No. This just prevents numpy from messing things up. This will not reduce the amount of processes used for data augmentation. It will speed your training up. Best, Fabian

So, it just reduce thread , yes,i saw so many thread here ,oh i will try it Thanks -:) Best Xinyu

FabianIsensee commented 5 years ago

Numpy will use as many threads for (for example) matrix multiplications as your system has available. What numpy does not know is that there are 12 workers for data augmentation. If each of these 12 workers now uses 48 threads for every simple matrix operation then this creates way too much overhead. We want each background worker to use only one thread. This is what this does

zxyyxzz commented 5 years ago

not know is that there are 12 workers for data augmentation. If each of these 12 workers now uses 48 threads for every simple matrix operation then this creates way too much overhead. We want each background worker to use only one thread. This is what this does

Hi FabianIsensee : Why do you know so much, I really admire you.How do you get this knowledge.I just like a foolish guy. Oh my god Thanks

FabianIsensee commented 5 years ago

Thank you, but I am sure anybody who has spent three years with these kinds of problems will know when and why their code does not perform as it should. These things are just stuff you learn over time, so keep going ;-)

zxyyxzz commented 5 years ago

Thank you, but I am sure anybody who has spent three years with these kinds of problems will know when and why their code does not perform as it should. These things are just stuff you learn over time, so keep going ;-)

Hi Fabian: Do you know how two use three sigmoid to optimize the three regions?Any code to refer?:-) Best Xinyu

FabianIsensee commented 5 years ago

Hi Xinyu, I implemented that last year for our BraTS2018 participation but that was a different project. I don't have readily available code to do this in nnU-Net but you can relatively quickly implement that yourself: 1) you need to change the loss so that the CE part is replaced by BCE and that the dice loss uses sigmoid, not softmax 2) you need to chenge the number of output classes of the UNet to match the number of regions 3) you need to implement custom Transforms and add them to the data augmentation pipeline so that they convert the segmentation maps into the region maps (one hot encoded) 4) make sure all the segmentation channels are pased to the network and that the loss properly deals with that Best, Fabian

zxyyxzz commented 5 years ago

Hi Xinyu, I implemented that last year for our BraTS2018 participation but that was a different project. I don't have readily available code to do this in nnU-Net but you can relatively quickly implement that yourself:

1. you need to change the loss so that the CE part is replaced by BCE and that the dice loss uses sigmoid, not softmax

2. you need to chenge the number of output classes of the UNet to match the number of regions

3. you need to implement custom Transforms and add them to the data augmentation pipeline so that they convert the segmentation maps into the region maps (one hot encoded)

4. make sure all the segmentation channels are pased to the network and that the loss properly deals with that
   Best,
   Fabian

Hi Fabian: 1.Just use CE to replace BCE and use sigmoid to replace softmax?Dose the dice loss implement need be modified? 3.Why need implement custom Transforms and add them to the data augmentation pipline? I don't understand this step

Thank you for reple:-) Best Xinyu

FabianIsensee commented 5 years ago

Hi, I cannot help you with all this right now, I am sorry. I just got a daughter two days ago and everything is about her right now :-) Quickly about the two points: 1) you need to look deeper into the loss functions to figure this out. I don't know this from the top of my head. Just make sure the nonlinearities are OK (BCE has sigmoid included so you should not apply it before this loss. dice loss needs to have it applied beforehand) 3) with deep supervision the unet will output not one but several segmentation maps at different resolutions. For each resolution the loss needs to be computed so there needs to be an appropriately downscaled version of the ground truth. You can achieve that via Transforms in the data augmentation pipeline, but you need to implement that yourself Best, Fabian

zxyyxzz commented 5 years ago

Hi, I cannot help you with all this right now, I am sorry. I just got a daughter two days ago and everything is about her right now :-) Quickly about the two points:

1. you need to look deeper into the loss functions to figure this out. I don't know this from the top of my head. Just make sure the nonlinearities are OK (BCE has sigmoid included so you should not apply it before this loss. dice loss needs to have it applied beforehand)

2. with deep supervision the unet will output not one but several segmentation maps at different resolutions. For each resolution the loss needs to be computed so there needs to be an appropriately downscaled version of the ground truth. You can achieve that via Transforms in the data augmentation pipeline, but you need to implement that yourself
   Best,
   Fabian

Hi: Your daughter must be very cute and happiness because you are really nice father:) Thanks for your reply,i konw what your mean,but if i do not use deep supervision,maybe i do not need to implement custom Transforms for different resolutionsm, right?

Best Xinyu

zxyyxzz commented 5 years ago

Hi, I cannot help you with all this right now, I am sorry. I just got a daughter two days ago and everything is about her right now :-) Quickly about the two points:

1. you need to look deeper into the loss functions to figure this out. I don't know this from the top of my head. Just make sure the nonlinearities are OK (BCE has sigmoid included so you should not apply it before this loss. dice loss needs to have it applied beforehand)

2. with deep supervision the unet will output not one but several segmentation maps at different resolutions. For each resolution the loss needs to be computed so there needs to be an appropriately downscaled version of the ground truth. You can achieve that via Transforms in the data augmentation pipeline, but you need to implement that yourself
   Best,
   Fabian

Hi Fabian: I meet a new trouble here, one of the global dice of validation is close to zero while only using dice loss. But when i use the sum of dice and ce, everything is ok. So is the soft-dice loss correct? The global dice is as follows: Val glob dc per class: [0.02365254889172594, 0.7808338269560635, 0.8130702233783669] epoch=8

Best Xinyu

FabianIsensee commented 5 years ago

Hi, from my experience the dice loss works, also without the crossentropy part. You should probably give it more time. This is just epoch 8 :-) If you want it to converge faster, set do_bg=True. This will also increase stability. But it may reduce performance a tiny little bit. Best, Fabian