'demo/brats_segmentation.py' generates all-zero outputs while training loss is low

NifTK / NiftyNet

[unmaintained] An open-source convolutional neural networks platform for research in medical image analysis and image-guided therapy

http://niftynet.io

Apache License 2.0

1.37k stars 405 forks source link

'demo/brats_segmentation.py' generates all-zero outputs while training loss is low #76

Closed kwang-greenhand closed 6 years ago

kwang-greenhand commented 6 years ago

Hi there,

I'm having a very weird problem dealing with a two-class segmentation problem: during training, the loss (generalized Dice) on both training and validation set looks reasonable (see figure1); however when inference I always got all-zero results, even on training set itself!

I think it's very unlikely that with all-zero output, the loss can be so good. Is there a possible bug?

mjorgecardoso commented 6 years ago

Can you please post here your configuration file?

kwang-greenhand commented 6 years ago

To add more information: I'm using a 3D volume. I also tried 2D, and it's somewhat working, although the results are not good, but I can tell it makes some sense. That's why I guess with all-zero output, the generalized Dice cannot be that low.

PS: here is my config file. I did image-level normalization before, so I don't want volume-level normalization here.

highres3DPlan4_CBF_loni.txt

mjorgecardoso commented 6 years ago

You are using the model at iteration 6000 at inference time. On the inference section of your model, can can you try setting the following and rerunning the inference only?

[INFERENCE] spatial_window_size = (16, 16, 16) output_interp_order = 0 border = (2, 2, 2) inference_iter = 80000 save_seg_dir = /ifs/loni/groups/loft/KaiWang/Machine_learning_learning/BRATS17/on_new_data/highres3Dplan4/official3/inference

kwang-greenhand commented 6 years ago

The thing is, the first few checkpoints of the model gave me not-all zero outputs, so I wanted to see what happened during training so I set it to 6000.

I tried your suggestion, sorry that it's still not working...

kwang-greenhand commented 6 years ago

By the way, I tried outputting the probability maps. Isn't it true that probability for the two classes should add up to 1? Why are they both all-zero (except for two voxels at class "0")??? I'm getting more and more suspicious there is a bug...

mjorgecardoso commented 6 years ago

@kwang-greenhand, the segmentation application is the most used one, and I've never seen such a nice training curve give zeros on a holdout set. There might be a but, but is likely something else. Is your training/testing/holdout data normalised the same way and with the same orientation?

@wyli , any clue what it might be? The config file looks fine.

kwang-greenhand commented 6 years ago

Well, I normalized all the images all the same way, and training/validation/testing sets were randomly split by niftynet so it shouldn't be a problem

wyli commented 6 years ago

Hi @kwang-greenhand to output probabilities, have you tried appending these parameters to your inference command: --label_normalisation False --output_prob True

Otherwise, I suspect the probabilistic output file is not saved in a proper data type. For the discrete output, what's the content of histogram.txt?

(btw what's your output of python net_segment.py --version, I couldn't replicate your results using the latest version of NiftyNet)

kwang-greenhand commented 6 years ago

Yeah, I tried that. For training, I have tried both with and without label normalization. For training with label normalization, if I turn off label normalization for inference, the result doesn't make sense: it doesn't even add up to 1 for two classes; for training without label normalization, then it adds up to 1, but the highest value for class "1" is 0.5, so basically still all-zero after argmax...

wyli commented 6 years ago

can't reproduce the issue, could you update your last comment with the content of histogram.txt, command used for training/inference, and NiftyNet version number python net_segment.py --version?

kwang-greenhand commented 6 years ago

Yeah sure.

The "brats_segmentation.py" is the application. The "histogram.txt" is generated only with "label_normalization", when without label_normalization, this file doesn't exist. I used net_run train -c highres3DPlan4_CBF_loni.txt -a brats_segmentation.BRATSApp and net_run inference -c highres3DPlan4_CBF_loni.txt -a brats_segmentation.BRATSApp for train and inference, except that I tried different options for output_prob and label_normalization as discussed before.

The package version is 0.2.2.

By the way, I tried using only one subject as training set, it can overfit well. I have no idea why it's different for having 51 subjects.

wyli commented 6 years ago

brats_segmentation.py was only created as a binary segmentation demo using BRATS dataset; for a general purpose segmentation application you should use python net_segment.py instead, this command is using niftynet/application/segmentation_application.py.

kwang-greenhand commented 6 years ago

My problem is a binary segmentation problem.

Actually I seem to have getting near the reason, although not sure yet. Maybe my volume size (161616) is relatively small compared to the receptive field of my network, which in order to maintain the output dimension introduced too many "zero paddings" during forward pass, which lead to the all-zero output... I'm still waiting for the results of a larger volume training to verify this.

If that's the real reason, then I'm so sorry I've wasted so much of your time. I'll let you know the results.

YilinLiu97 commented 6 years ago

Hi all, I have the exact same problem with deepmedic and my own data. I'm using the version 0.2.1.

YilinLiu97 commented 6 years ago

Hi @kwang-greenhand , have you resolved this issue? My volume size is (57,57,57) but I still got this problem...

kwang-greenhand commented 6 years ago

@YilinLiu97 Two questions:

Is your class relatively balanced, or severely imbalanced? If you have tons of background and only a few foreground, then the network is definitely more likely to predict 0, which requires extra attention to fix, like using generalized dice with 1/volume^2 weight or so.
1. What network are you using? What is the receptive field of the network? Did you use 'valid' convolution, or 'SAME'? I'm not sure, but just think it's a potential reason. If you use 'VALID' it's fine, but if 'SAME' and your receptive field is very large, then you're zero-padding a lot, which might lead to the issue. It's all my own reflection above, no official input, so don't take it as lifesaver...

YilinLiu97 commented 6 years ago

@kwang-greenhand Thank you so much!! I guess my problem falls into the first category as my ROIs are indeed very small..