Reproducing the results for BTCV (8-abdominal multi organ segmentation) dataset

ShahinaKK commented 10 months ago

Hello,

Thank you for providing the code. I have a couple of questions regarding the results reported for the BTCV dataset (8-abdominal multi organ segmentation task):

1.) In Section 3.1 of the paper, it is mentioned that the 18:12 data split (same as nnFormer) was followed, using a patch size of 48 x 192 x 192. However, the values reported in Table 2 for models like nnFormer (DSC: 86.5) and UNETR++ (DSC: 87.22) seem to be the same as those reported in their corresponding papers, where models were trained with a patch size of 64 x 128 x 128. Could you please confirm whether all the results in Table 2 were obtained by training models using the same patch size?

2.) I attempted to reproduce the BTCV dataset results (DSC: 87.84) using the default training setup from the GitHub repository (BATCH SIZE: 2, PATCH SIZE: 48 x 192 x 192). However, the dice score result I obtained were in the range of 83. Could you kindly provide more details on how to reproduce the value reported in the paper?

Thank you for your assistance.

Best regards.

PengchengShi1220 commented 10 months ago

Hello,

Thank you for your message and for your interest in our work!

1.) Regarding your query about our experimental setup: yes, all results reported in our research were obtained with version 1.7.1 of the nnUNet framework. You can access this specific version in our repository NexToU_nnunetv1. Within the NexToU project, we've adhered to the nnU-Net’s patch size recommendation of 48 x 192 x 192. For our work with nnFormer and UNETR++, we conformed to the patch sizes of [64, 128, 128] as indicated in their original studies. We are actively working on upgrading our codebase, including integrating additional network architectures and bolstering the reproducibility of our results in the NexToU_nnunetv2 within nnUNet version 2.

2.) As for the dice score you’ve reached: obtaining 83 is quite surprising and suggests a discrepancy with our reported score of 87.84. As indicated in Table 4 of our paper, even in the absence of Pool GNN and Swin GNN modules, the dice coefficient is expected to be around 85.19. The difference could be due to the number of classes used in your training compared to ours. Our training involved 13 classes, and our analysis concentrated on 8 classes – this is consistent with the approach adopted by nnFormer.

If there's anything more we can help clarify, or if you need additional support, please don't hesitate to get back to us.

Best regards, Pengcheng

ShahinaKK commented 10 months ago

Thank you for the prompt response.

I trained the model using the code provided in NextToU_nnunetv1 branch of your repo using the command "nnUNet_train 3d_fullres nnUNetTrainerV2_NexToU_BTI_Synapse Task111_Synapse_CT 0" with patch size (48 x192 x192) and number of classes 13. And, the evaluation was done using results for eight organs (labels: 1, 2, 3, 4, 6, 7, 8, 11) to get the average dice score in the range 83.
Could you kindly confirm whether the data split used for training aligns with the one indicated in the nnFormer repository (given in the figure)? BTCV_synapsesplit

Thanks,

PengchengShi1220 commented 10 months ago

@ShahinaKK Thank you for reaching out and for your interest in the model training process.

I can confirm that the data split used for the training is indeed consistent with the one provided in the nnFormer repository.

I'd like to address your concern regarding the lower dice score results. In the recent version updates of our frameworks (NexToU_nnunetv1 and NexToU_nnunetv2), we identified that the "position embedding" code, crucial for dynamic graph construction and graph convolution learning in Vision GNN, was mistakenly omitted due to version iterations. This omission likely impacted the performance of the model, contributing to the lower dice scores you observed.

We have now rectified this by reintegrating the "position embedding" code in both NexToU_nnunetv1 and NexToU_nnunetv2. You can review the updates here:

NexToU_nnunetv1: Position Embedding Code Integration
NexToU_nnunetv2: Position Embedding Code Integration

I recommend retraining the model with the updated code. This should provide a more accurate replication of the results reported in our paper.

Additionally, I've attached a compressed file (NexToU_BTI_bs_2_ps_48_192_192.zip) containing the prediction results that led to the average dice score of 87.84 for the specified organs with labels 1, 2, 3, 4, 6, 7, 8, and 11. NexToU_BTI_bs_2_ps_48_192_192.zip

If you need any further information or assistance, please do not hesitate to ask.

ShahinaKK commented 9 months ago

Thank you for your prompt response and for addressing the concerns regarding the model training process.

I retrained the model with the updated code (with position embedding) but the average dice score from two runs are: 83.44 and 83.98 which is still far away from the values reported in the paper. If possible, can you please share the BTCV preprocessed dataset you used for training the model?

Thanks

wafaAlghallabi commented 9 months ago

Thank you for your efforts.

Please share the BTCV dataset used in the training.

Thanks.

PengchengShi1220 commented 9 months ago

@ShahinaKK @wafaAlghallabi Thank you for reaching out with your concerns.

Our data preprocessing pipeline aligns with the setup used in nnUNet.

BTCV raw data from The Synapse multi-organ CT dataset

Raw train data: Our train data: They're identical!

The original data and labels for your reference are visible in the following image:

Regarding the CT data, one potential issue could be the configuration of the dataset.json file in nnUNet, which must specify the modality like this:

"modality": {
    "0": "CT"
}

For the BTCV dataset under nnUNet v1.7, the dataset.json file we employed can be found here: dataset.json

Please feel free to ask if you require additional information or support.

ShahinaKK commented 9 months ago

Thank you for your efforts and the provided information. Unfortunately, the results haven't improved. Any additional insights or suggestions would be greatly appreciated. Thanks,

PengchengShi1220 commented 9 months ago

Thank you for your feedback. To ensure the reproducibility of the results, it's critical to control for variables outside of the network architecture. This includes consistent data partitioning for training and validation, as well as running the model through the full 1000 epochs as per nnUNet's original setup. Please verify that these settings are aligned with the nnUNet framework.

In the coming days, I plan to upload the latest reproduction model weights for NexToU_BTI on the BTCV dataset. Stay tuned.

PengchengShi1220 commented 9 months ago

@ShahinaKK Hi Shahina,

I have uploaded the latest model weights for NexToU_BTI (NexToU_nnunetv1) on the BTCV dataset. The performance metrics: Dice coefficient of 87.61 and Hausdorff Distance (HD) of 7.42.

You can access and download the model weights using this link: NexToU_nnunetv1 Model Weights.

Best, Pengcheng

ShahinaKK commented 9 months ago

@PengchengShi1220 Hello, I would like to express my gratitude for providing the model weights and log file. Your efforts in responding to my queries are truly appreciated. Upon reviewing the network printed in your training log file, I noticed a disparity with the network initialized in the code. Due to its size, I plan to share the last layer from both networks:

Based on the network initialization in repo: code_nw Based on the log file provided: log_file

Can you please confirm this? Thanks in advance

PengchengShi1220 commented 9 months ago

@ShahinaKK Hi, I assure you that the stage counts and the parameters of convolutions for network initialization in the NexToU_nnunetv1 are consistent with the nnUNet framework. If a difference is observed, it may stem from variations in the data preprocessing or environment setup.

To ensure consistency and aid in reproducing the results, I recommend the following steps:

Download the original BTCV dataset.
Clone the nnUNet 1.7.1 and NexToU_nnunetv1 repositories.
Adjust batch_size and patch_size using the change_batch_size_patch_size.py script as necessary.
Carefully follow the nnUNet preprocessing steps as outlined.
Utilize the change_split_file_111_split_18_12.py script for the dataset split tailored for NexToU_nnunetv1.

By following these steps, the initial discrepancy should be resolved, and the network should initialize as intended.

Please feel free to ask if you require additional support.

PengchengShi1220 / NexToU

Reproducing the results for BTCV (8-abdominal multi organ segmentation) dataset #4