MIC-DKFZ / nnUNet

Apache License 2.0
5.59k stars 1.71k forks source link

Pseudo dice [0.0, 0.0, 0.0, nan, nan] #2402

Closed NGYLK closed 1 week ago

NGYLK commented 1 month ago

Hello, I encountered the following problem while using the ProstateX dataset to segment lesions. The data and masks I used are from the following sources:

Data: https://wiki.cancerimagingarchive.net/display/Public/SPIE-AAPM-NCI+PROSTATEx+Challenges Masks: https://github.com/rcuocolo/PROSTATEx_masks

Since my goal is not to segment the prostate but to segment various lesions, I modified the JSON file as follows:

JSON

{ "channel_names": { "0": "T2", "1": "ADC", "2": "PD-W", "3": "Ktrans" }, "labels": { "background": 0, "lesion area 1": 1, "lesion area 2": 2, "lesion area 3": 3, "lesion area 4": 4, "lesion area 5": 5 }, "numTraining": 138, "file_ending": ".nii.gz", "overwrite_image_reader_writer": "SimpleITKIO" }

After processing the dataset, I performed 3d_fullres training. The results were similar to those obtained using nnUNetv2_train 005 3d_fullres 0 -p nnUNetResEncUNetMPlans, with the Dice score being 0 or very small. The output is as follows:

/home/usst/znn/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py:1119: RuntimeWarning: invalid value encountered in scalar divide global_dc_per_class = [i for i in [2 i / (2 i + j + k) for i, j, k in zip(tp, fp, fn)]] 2024-07-27 18:46:09.659058: train_loss 0.124 2024-07-27 18:46:09.660206: val_loss 0.0098 2024-07-27 18:46:09.660450: Pseudo dice [0.0, 0.0, 0.0, nan, nan] 2024-07-27 18:46:09.660632: Epoch time: 287.42 s 2024-07-27 18:46:09.660773: Yayy! New best EMA pseudo Dice: 0.0 2024-07-27 18:46:14.580748: 2024-07-27 18:46:14.581524: Epoch 1 2024-07-27 18:46:14.582067: Current learning rate: 0.00999 2024-07-27 18:47:32.719907: train_loss 0.0049 2024-07-27 18:47:32.721000: val_loss -0.0055 2024-07-27 18:47:32.721253: Pseudo dice [0.0, 0.0, 0.0, nan, nan] 2024-07-27 18:47:32.721439: Epoch time: 78.15 s 2024-07-27 18:47:35.457978: 2024-07-27 18:47:35.458524: Epoch 2 2024-07-27 18:47:35.458964: Current learning rate: 0.00998 2024-07-27 18:48:53.375348: train_loss -0.0046 2024-07-27 18:48:53.376186: val_loss -0.0123 2024-07-27 18:48:53.376576: Pseudo dice [0.0013, 0.0, 0.0, nan, nan] 2024-07-27 18:48:53.376763: Epoch time: 77.93 s 2024-07-27 18:48:53.376897: Yayy! New best EMA pseudo Dice: 0.0 2024-07-27 18:49:08.874014: 2024-07-27 18:49:08.874517: Epoch 3 2024-07-27 18:49:08.874846: Current learning rate: 0.00997 2024-07-27 18:50:26.657454: train_loss -0.0089 2024-07-27 18:50:26.658200: val_loss -0.014 2024-07-27 18:50:26.658440: Pseudo dice [0.0, 0.0, 0.0, nan, nan] 2024-07-27 18:50:26.658866: Epoch time: 77.79 s 2024-07-27 18:50:29.494811:

Can you help me resolve my confusion? Thank you very much.

gaojh135 commented 1 month ago

You could try reducing the learning rate.

NGYLK commented 1 month ago

Hello, I reduced the learning rate by 10 times, but the dice results are worse than before, and the epoch time is very long. It seems that lowering the learning rate is not the solution. Could it be that my dataset is incompatible or is there another issue? ![Uploading progress.png…]()

NGYLK commented 1 month ago

I tried to increase the learning rate, but the results were unsatisfactory. Can you help me answer which link caused the problem? Thank you.2024-07-28 16:34:02.685723: 2024-07-28 16:34:02.686363: Epoch 45 2024-07-28 16:34:02.686632: Current learning rate: 0.09594 2024-07-28 16:35:20.624577: train_loss -0.0778 2024-07-28 16:35:20.625140: val_loss -0.0742 2024-07-28 16:35:20.625381: Pseudo dice [0.104, 0.0, 0.0, nan, nan] 2024-07-28 16:35:20.625580: Epoch time: 77.95 s 2024-07-28 16:35:23.363433: 2024-07-28 16:35:23.364056: Epoch 46 2024-07-28 16:35:23.364567: Current learning rate: 0.09585 2024-07-28 16:36:41.368167: train_loss -0.0765 2024-07-28 16:36:41.368969: val_loss -0.0837 2024-07-28 16:36:41.369219: Pseudo dice [0.1637, 0.0, 0.0, nan, nan] 2024-07-28 16:36:41.369422: Epoch time: 78.01 s 2024-07-28 16:36:41.369570: Yayy! New best EMA pseudo Dice: 0.0425 2024-07-28 16:36:50.926840: 2024-07-28 16:36:50.927459: Epoch 47 2024-07-28 16:36:50.927812: Current learning rate: 0.09576 2024-07-28 16:38:08.796351: train_loss -0.0729 2024-07-28 16:38:08.797282: val_loss -0.0797 2024-07-28 16:38:08.797523: Pseudo dice [0.1445, 0.0, 0.0, nan, nan] 2024-07-28 16:38:08.797719: Epoch time: 77.88 s 2024-07-28 16:38:08.797858: Yayy! New best EMA pseudo Dice: 0.0431 2024-07-28 16:38:20.160509: 2024-07-28 16:38:20.161460: Epoch 48 2024-07-28 16:38:20.161932: Current learning rate: 0.09567 2024-07-28 16:39:38.040921: train_loss -0.0759 2024-07-28 16:39:38.041884: val_loss -0.0863 2024-07-28 16:39:38.042135: Pseudo dice [0.1263, 0.0, 0.0, nan, nan] 2024-07-28 16:39:38.042338: Epoch time: 77.89 s 2024-07-28 16:39:40.893677: 2024-07-28 16:39:40.894322: Epoch 49 2024-07-28 16:39:40.894824: Current learning rate: 0.09558 2024-07-28 16:40:58.759103: train_loss -0.0762 2024-07-28 16:40:58.760003: val_loss -0.0861 2024-07-28 16:40:58.760440: Pseudo dice [0.1677, 0.0073, 0.0, nan, nan] 2024-07-28 16:40:58.760651: Epoch time: 77.87 s 2024-07-28 16:41:14.789259: Yayy! New best EMA pseudo Dice: 0.0445

ykirchhoff commented 1 month ago

Hi @NGYLK,

could you give some more details about the dataset. It seems like classes 4 and 5 are either not present at all or at least very rare. Other than that it seems like this is a rather hard task, but I would need more details here. Did you finish a training and if so, what is the dice reported in the summary.json?

Best, Yannick

NGYLK commented 1 month ago

Hello, I am glad to receive your reply. I am not trying to complete five segmentation tasks. My goal is to complete the segmentation of lesions on the prostateX dataset. In my label file, there may be 3 to 4 lesions in a prostate, so I made the following changes to the json file. Similar to the picture below, there are 2 lesions under one label file. Is there something wrong with my modification of the json file? { "channel_names": { "0": "T2", "1": "ADC", "2": "PD-W", "3": "Ktrans" }, "labels": { "background": 0, "lesion_area1": 1, "lesion_area2": 2, "lesion_area3": 3, "lesion_area4": 4, "lesion_area5": 5 }, "numTraining": 138, "file_ending": ".nii.gz", "overwrite_image_reader_writer": "SimpleITKIO" } 屏幕截图 2024-08-01 111721 数据:https://wiki.cancerimagingarchive.net/display/Public/SPIE-AAPM-NCI+PROSTATEx+Challenges 面具:https://github.com/rcuocolo/PROSTATEx_masks If you need more information about the dataset, please tell me which part it is.

ykirchhoff commented 1 month ago

Hi @NGYLK,

did I get that right that your labels are for separate lesions, i.e. if you have three lesions in the prostate they are assigned labels 1 to 3? This won't work with nnUNet, as it is not capable of doing instance segmentation but only semantic segmentation, i.e. assigning the lesion label independently of the instance. You can get instances from the predictions using connected component analyses, but the predictions itself can only differentiate between different classes.

Best, Yannick

NGYLK commented 1 month ago

Hi @NGYLK, 嗨@NGYLK,

did I get that right that your labels are for separate lesions, i.e. if you have three lesions in the prostate they are assigned labels 1 to 3? This won't work with nnUNet, as it is not capable of doing instance segmentation but only semantic segmentation, i.e. assigning the lesion label independently of the instance. You can get instances from the predictions using connected component analyses, but the predictions itself can only differentiate between different classes.我是否正确,您的标签是针对单独的病变的,即如果您的前列腺中有三个病变,它们将被分配标签 1 到 3?这不适用于 nnUNet,因为它无法进行实例分割,而只能进行语义分割,即独立于实例分配病变标签。您可以使用连接组件分析从预测中获取实例,但预测本身只能区分不同的类别。

Best, 最好 Yannick 雅尼克

Thank you for your response. I now understand where my issue lies. I am now in the process of using nnUNetV2 to segment the prostate, PZ (Peripheral Zone), and TZ (Transition Zone). These are three different semantic segmentation tasks. The naming convention I used in my JSON file is the same as the one you provided in Dataset conversion:

json { "channel_names": { "0": "T2", "1": "ADC", "2": "PD-W", "3": "Ktrans" }, "labels": { "background": 0, "Prostate": 1, "PZ": 2, "TZ": 3 }, "numTraining": 174, "file_ending": ".nii.gz", "overwrite_image_reader_writer": "SimpleITKIO" } In the naming process for the labelTr files, I differentiated them with _0001, _0002, _0003, as shown in the figure below. However, during preprocessing, I found that it could not find my mask files. The error message is:

AssertionError: not all training cases have a label file in labelsTr. Fix that. Missing: ['ProstateX-0000', 'ProstateX-0001', 'ProstateX-0002', 'ProstateX-0003', ...] This suggests that my label files in labelsTr can only have one segmentation mask. Is there an error in the way I named the mask files? 屏幕截图 2024-08-08 144046

ykirchhoff commented 1 month ago

You need just one label file without the suffix (_000i) and the different labels should be indicated by the respective numbers in this single file.

NGYLK commented 1 month ago

您只需要一个不带后缀 (_000i) 的标签文件,不同的标签应由此文件中的相应数字表示。

"Thank you for your reply. I have resolved the issues I encountered during the training process. However, I have encountered a problem during the prediction process and in the process of selecting the best model because I used the folded nnUNet. The command I entered is:

nnUNetv2_predict -i /home/usst/znn/nnUNet/DATASET/nnUNet_results/Dataset005_Prostate/nnUNetTrainernnUNetResEncUNetMPlans -o /home/usst/znn/nnUNet/DATASET/nnUNet_results/Dataset005_Prostate/nnUNetTrainernnUNetResEncUNetMPlans -d 005 -c3d_fullres -f 1

Yet the output result is:

####################################################################### Please cite the following paper when using nnU-Net: Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211. #######################################################################

There are 0 cases in the source folder I am process 0 out of 1 (max process ID is 0, we start counting with 0!) There are 0 cases that I would like to predict The training process has ended without any issues, so I am wondering why there would be no cases present. I hope to receive your reply." image

ykirchhoff commented 1 month ago

Hi,

the input folder you provide with the -i argument is not the results folder of your training, this is specified by the -d, -c and -p arguments (note that you have to set -p for the ResEncM). -i should lead to the folder containing your test cases, which you want to predict.

Best, Yannick

ykirchhoff commented 1 week ago

Hey,

did you get it running? If not, please let me know what the error is.

Best, Yannick