Project-MONAI / research-contributions

Implementations of recent research prototypes/demonstrations using MONAI.
https://monai.io/
Apache License 2.0
1.03k stars 336 forks source link

Reproducing the results of UNETR on BTCV competition #132

Closed AbdelrahmanShakerYousef closed 1 year ago

AbdelrahmanShakerYousef commented 2 years ago

Hello,

Thank you for sharing your work and the codebase.

I am trying to reproduce the results of UNETR for the BTCV dataset mentioned in Table. 1 (Without using any additional data on the standard leaderboard). The number in the paper is 0.856. However, when I retrained UNETR on the training data of BTCV and submitted the Avg is 0.765. I am wondering if you could please share the testing or the submission scripts to be able to reproduce the same numbers of 0.856 without any additional data. Also, you release only the validation and the training transformation, what about the testing transformation for submission? Are they the same as validation transformation?

Thanks!

tangy5 commented 2 years ago

Hi @AbdelrahmanShakerYousef , thanks for the questions. The question regards a general idea of improving the performance. I guess I can provide some suggestions. 1. Large Batch size might help. the model is trained using 32G x N GPUs. 2. Cross-validation model ensemble improves performance. 3. Post processing including largest connected components (LCC) can help remove outlier regions. 4. Remove DSC scores of missing organs. 5. Test time augmentation helps to improve. 6.The tutorial is a beginner usage guidance, you might want to further tune the parameters, or extend training iterations to help model to converge to an optimal status. If the training, valdiation fold is 0, an average DSC of 82% should be good as an internal validation performance.

Hope these helps.

hanoonaR commented 2 years ago

@tangy5, Thank you for proving details on additional methods that would help in reproducing the reported number.

However, I have been facing the similar problem as @AbdelrahmanShakerYousef, while making a submission in the challenge portal. As the authors have not provided any details on the transformations used during test time or specifically, for the submission, I could only reproduce an average of 0.75 DSC (without additional data) as compared to 0.85. If the authors could kindly provide some details on the submission method or associated post-processing, it would be really helpful for methods building on UNETR to compare with UNETR baseline.

Your response is highly appreciated. Thank you.

tangy5 commented 2 years ago

@tangy5,

Thank you for proving details on additional methods that would help in reproducing your reported number.

However, I have been facing the similar problem as @AbdelrahmanShakerYousef, while making a submission in the challenge portal. As you have not provided any details on the transformations used during test time or specifically, for the submission, I could only reproduce an average of 0.75 DSC (without additional data) as compared to 0.85. Could you please provide some details on the submission method or associated post-processing, as it would be really helpful for methods building on UNETR to compare with UNETR baseline.

Your response is highly appreciated. Thank you.

@hanoonaR Thanks, I can provide more details of the post-processing in addition to above suggestions. The test transform is same with validation transform. As I can remember, we also tried training with spacing 1.0 as isotropic resolutions, the higher resolution and longer training iterations can help, and we trained large-scale models, selected, ensemble best models according to validation performance. Besides, test time augmentation such as flip, rotation can boost performance as well. SlideingWindowInference with higher overlap, and mode with Gaussian can also help.I guess these training/testing tricks might fit general segmentation tasks. I would suggest to train longer if current DSC is ~75, seems the model haven't reached optimal yet, the validation DSC of ~0.82 is normal and competitive give an single model output.

Thank you again, I might have explained all details I can remember when doing training/testing. It would be great to see more works. Very appreciative.

AbdelrahmanShakerYousef commented 2 years ago

Thank you @tangy5 for replying and for your valuable comments, we really appreciate it.

I just have one more question, you said that the test transform is the same as the validation transform. But you apply some transformations on the label in the validation transform. How can this be handled with the testing transform? Also, CropForeground transformation should not be applied for the testing, right?

Kindly check the transformation we apply on both validation and testing image

tangy5 commented 2 years ago

@AbdelrahmanShakerYousef Yes, the pre-processing transformations are still needed, standarized orientation, resampleing with spacing resolutions, scale intensities to normalize images, and convert to tensor. CropForegroundd can be removed, there is no much difference w/w.o this. If you include this, there should be an invert transform to pad output images.

hanoonaR commented 2 years ago

@tangy5, thank you for the prompt response! Much appreciated.

I have the similar question as @AbdelrahmanShakerYousef , how can one apply these test time augmentations during inference on the test samples for which we do no have the labels? Should we apply inversions after the predictions, for the submission?

Apologies, if I misunderstood anything. Your response is highly appreciated.

tangy5 commented 2 years ago

@hanoonaR Test Time Augmentation (TTA) refers to augment test images before feeding to model. No labels are needed. Something like this (a flip augmentation example):

# test time augmentation
if num_tta == 0 or num_tta == 1:
    flip_tta = []
elif num_tta == 4:
    flip_tta = [[2], [3], [4]]
elif num_tta == 8:
    flip_tta = [[2], [3], [4], (2, 3), (2, 4), (3, 4), (2, 3, 4)]

ct = 1.0
with torch.cuda.amp.autocast():
    pred = sliding_window_inference(val_images, roi_size, sw_batch_size, model, mode="gaussian", overlap=overlap_ratio)

for dims in flip_tta:
    with torch.cuda.amp.autocast():
        flip_pred = torch.flip(sliding_window_inference(torch.flip(val_images, dims=dims), roi_size, sw_batch_size, model, mode="gaussian", overlap=overlap_ratio), dims=dims)
    pred += flip_pred
    ct += 1.0

val_outputs = pred / ct
hanoonaR commented 2 years ago

Great, thank you for the clarification @tangy5 . Really appreciate your support.

tangy5 commented 2 years ago

Great, thank you for the clarification @tangy5 . Really appreciate your support.

No problem. Thanks.