MIC-DKFZ / nnUNet

Apache License 2.0
5.79k stars 1.74k forks source link

Multi-modalities GTV auto-segmentation #1320

Closed duwang2015 closed 11 months ago

duwang2015 commented 1 year ago

Intro: We trying to do HN GTV auto-segmentation with both CT and PET images Issue: We manually did the CV for each fold using the same input as "validation_raw_postprocessed" but got a much lower Dice.

Model: 3D fullres Input image: Cropped (144x144x144) CT and PET(SUV map) from TCIA database. We also normalized CT (to [-1,1]) and PET (Z score). Training Progress(fold 0): image Dice compared with original labels: around 0.6 -0.7

Manually do validation with the same cases and image input: nnUNet_predict -i /trained_models/nnUNet/3d_fullres/Task511_HN_Target_CT_PT/nnUNetTrainerV2nnUNetPlansv2.1/fold_0/validation_data -o /trained_models/nnUNet/3d_fullres/Task511_HN_Target_CT_PT/nnUNetTrainerV2nnUNetPlansv2.1/fold_0/validation_predicted -m 3d_fullres -t Task511_HN_Target_CT_PT -f 0 -chk model_best --save_npz

Dice compared with original labels: around 0.1 -0.3

The following image represents one sample case at the same slice level: (From left to right: SUV map; Original GTV contour(Label); automatically generated Validation (Dice = 0.697); Manual validation (Dice = 0.024)) image

In general, our manual validation result shouldn't have a such big difference from the validation result automatically generated. We also tested by using CT only and got similar results.

Please let me know if you have any suggestions on this issue. Thank you!

FabianIsensee commented 1 year ago

Hey, thanks for bringing this up. This is indeed not intended behavior. I would need to reproduce this in order to be able to fix it though. Would it be possible to share the dataset & trained model (1 fold is enough)?

duwang2015 commented 1 year ago

Hi Please see the attached zip file includes my final_check point and the best model for fold 0. I include 5 validation cases here, let me know if you need more.

Issue.zip

FabianIsensee commented 1 year ago

Hey, the file seems corrupted. It has a size of 0bytes. Could you try to upload it again? If that doesn't work upload it to google drive (or whatever) and share the link here

duwang2015 commented 1 year ago

Sure. Please find the link below: https://drive.google.com/file/d/1deLu4UdrcnddmVgJbVbK-geQ738BJ6Y9/view?usp=share_link

duwang2015 commented 1 year ago

Update: We run a test by using original (Without normalization) CT images only (No PET) and maintain the rest setting the same as before. The automatically generated Validation (Dice = 0.5220); Manual validation (Dice = 0.5226) are comparable this time.

FabianIsensee commented 1 year ago

Thanks for sharing and apologies for ghosting you. I barely find the time to do anything these days. I am curently downloading the file and will try to take a look the week after next week (:see_no_evil: sorry)

duwang2015 commented 1 year ago

Thanks for sharing and apologies for ghosting you. I barely find the time to do anything these days. I am curently downloading the file and will try to take a look the week after next week (🙈 sorry)

No problem. Thank you for your assistance. Also, as we are testing on our end as well, we got some questions about pre-processing.

  1. I saw there was a normalization function in the default_preprocessor.py file, will this affect the images that have already been normalized?
  2. We try to disable the pro-processing by adding -no_pp like nnUNet_plan_and_preprocess -t XXX --verify_dataset_integrity -no_pp. However, this will prevent us from training as no pre-processed images there.
FabianIsensee commented 1 year ago

Hey, I was able to generate some segmentations with the data you sent me but they don't make a lot of sense. Is the model you sent me fully trained? I think I will need the full dataset to be able to tell what's going on. But if you already solved the problem then that's fine, too:-)

HN-CHUM-021.nii.gz HN-CHUM-004.nii.gz HN-CHUM-006.nii.gz HN-CHUM-013.nii.gz HN-CHUM-019.nii.gz

duwang2015 commented 1 year ago

image Yes, the model was fully trained with 1000 iterations as indicted in the screenshot above from the log file.

I haven't solved the problem yet. I'm testing to avoid the preprocessing step since my image data were already cropped and normalized. But seems the training won't start without run the pre-processing. Do you have any suggestions on this?

Could you drop your email address so I can share the full dataset to you. Thank you!

FabianIsensee commented 1 year ago

f.isensee at the dkfz.de domain. Preprocesing is mandatory in nnU-Net and I am pretty sure that that is not the problem. I will need to take a look myself at the data. Best, Fabian

Karol-G commented 1 year ago

Hey @duwang2015,

I am taking over some of Fabian's issues as he is quite busy at the moment. Is there any update from your side or is this issue even solved?

Best, Karol

duwang2015 commented 1 year ago

Hi Karol,

Thanks for handling this matter. As indicated earlier, we've managed to observe a similar outcome between Auto and Manual validation this round. Nevertheless, the outcome wasn't satisfactory, with around 30% of cases showing a significantly low dice score (<0.4). We're currently working hard to address this issue, and I'd greatly appreciate any suggestions you might have to offer.

Best, Du

Karol-G commented 1 year ago

Hi Du,

To confirm, you managed to solve the issue of different predictions being generated from the automatic validation prediction vs. the manual prediction? If yes, was this a problem caused by nnU-Net?

A separate issue you have now is that the results in general are not satisfactory, right? Fabian had a short look at the data you shared with him and it seemed to him that the ground truth segmentations either might not have the sufficient quality or that the tumors you are trying to segment have a very low signal thus requiring more cases for reliably solving this task.

However, he had only a short look at the data due to time constraints. In case you give the permission, I can also have a more in-depth look at the data to confirm his findings and/or check for other indicators that could explain your problem.

Best, Karol

duwang2015 commented 1 year ago

Hi Karol,

We managed to resolve the discrepancy between the automatic validation prediction and the manual prediction by using unnormalized CT images. However, we're uncertain if that was the underlying cause of the issue and would like to know why it doesn't work with normalized images. It would be greatly appreciated if you could take a look at both the model and the data we shared above, for this issue and also the low-performance problem.

All the data previously shared is accessible to you. Please don't hesitate to let me know if you encounter any problems.

Thank you! Du

Karol-G commented 1 year ago

Hi Du,

We managed to resolve the discrepancy between the automatic validation prediction and the manual prediction by using unnormalized CT images. However, we're uncertain if that was the underlying cause of the issue and would like to know why it doesn't work with normalized images.

The images used for prediction need to be in the same state as the images in nnUNet_raw_data as nnU-Net applies the same preprocessing steps on the images during inference as on the train data in nnUNet_raw_data. So if you apply some kind of normalization on the images you use for inference, but did not apply the normalization on the train images then the predictions will be of low quality.

Regarding the train data, I inspected some cases visually and it also seems to me that the ground truth segmentations are of low quality. It appears that the segmentation is very coarse and often also overlaps with bones or air on the CT scan. Occasionally, the segmentation is also not following / lagging behind the PET signal. I am not a medical expert, but for some of the inspected cases it is not clear if there is even a tumour or at least enough signal to detect the tumor.

I strongly recommend to improve the ground truth segmentations as that is most likely the source of your problem. If you used some interactive segmentation method (random Forest or etc.) to produce the ground truth than that can also be an issue.

I hope I was able to give some helpful advice!

Best, Karol

duwang2015 commented 1 year ago

Thank you, Karol!

We observed the discrepancy under 5-fold cross-validation and both the train and validate dates were pre-cropped and normalized. We intend to revisit our raw data to investigate if there were any issues in the preprocessing stages.

Thank you for your guidance.

Best, Du