Closed AbdelrahmanShakerYousef closed 1 year ago
Hi @AbdelrahmanShakerYousef , thanks for the questions. The question regards a general idea of improving the performance. I guess I can provide some suggestions. 1. Large Batch size might help. the model is trained using 32G x N GPUs. 2. Cross-validation model ensemble improves performance. 3. Post processing including largest connected components (LCC) can help remove outlier regions. 4. Remove DSC scores of missing organs. 5. Test time augmentation helps to improve. 6.The tutorial is a beginner usage guidance, you might want to further tune the parameters, or extend training iterations to help model to converge to an optimal status. If the training, valdiation fold is 0, an average DSC of 82% should be good as an internal validation performance.
Hope these helps.
@tangy5, Thank you for proving details on additional methods that would help in reproducing the reported number.
However, I have been facing the similar problem as @AbdelrahmanShakerYousef, while making a submission in the challenge portal. As the authors have not provided any details on the transformations used during test time or specifically, for the submission, I could only reproduce an average of 0.75 DSC (without additional data) as compared to 0.85. If the authors could kindly provide some details on the submission method or associated post-processing, it would be really helpful for methods building on UNETR to compare with UNETR baseline.
Your response is highly appreciated. Thank you.
@tangy5,
Thank you for proving details on additional methods that would help in reproducing your reported number.
However, I have been facing the similar problem as @AbdelrahmanShakerYousef, while making a submission in the challenge portal. As you have not provided any details on the transformations used during test time or specifically, for the submission, I could only reproduce an average of 0.75 DSC (without additional data) as compared to 0.85. Could you please provide some details on the submission method or associated post-processing, as it would be really helpful for methods building on UNETR to compare with UNETR baseline.
Your response is highly appreciated. Thank you.
@hanoonaR Thanks, I can provide more details of the post-processing in addition to above suggestions. The test transform is same with validation transform. As I can remember, we also tried training with spacing 1.0 as isotropic resolutions, the higher resolution and longer training iterations can help, and we trained large-scale models, selected, ensemble best models according to validation performance. Besides, test time augmentation such as flip, rotation can boost performance as well. SlideingWindowInference with higher overlap, and mode with Gaussian can also help.I guess these training/testing tricks might fit general segmentation tasks. I would suggest to train longer if current DSC is ~75, seems the model haven't reached optimal yet, the validation DSC of ~0.82 is normal and competitive give an single model output.
Thank you again, I might have explained all details I can remember when doing training/testing. It would be great to see more works. Very appreciative.
Thank you @tangy5 for replying and for your valuable comments, we really appreciate it.
I just have one more question, you said that the test transform is the same as the validation transform. But you apply some transformations on the label in the validation transform. How can this be handled with the testing transform? Also, CropForeground transformation should not be applied for the testing, right?
Kindly check the transformation we apply on both validation and testing
@AbdelrahmanShakerYousef Yes, the pre-processing transformations are still needed, standarized orientation, resampleing with spacing resolutions, scale intensities to normalize images, and convert to tensor. CropForegroundd can be removed, there is no much difference w/w.o this. If you include this, there should be an invert transform to pad output images.
@tangy5, thank you for the prompt response! Much appreciated.
I have the similar question as @AbdelrahmanShakerYousef , how can one apply these test time augmentations during inference on the test samples for which we do no have the labels? Should we apply inversions after the predictions, for the submission?
Apologies, if I misunderstood anything. Your response is highly appreciated.
@hanoonaR Test Time Augmentation (TTA) refers to augment test images before feeding to model. No labels are needed. Something like this (a flip augmentation example):
# test time augmentation
if num_tta == 0 or num_tta == 1:
flip_tta = []
elif num_tta == 4:
flip_tta = [[2], [3], [4]]
elif num_tta == 8:
flip_tta = [[2], [3], [4], (2, 3), (2, 4), (3, 4), (2, 3, 4)]
ct = 1.0
with torch.cuda.amp.autocast():
pred = sliding_window_inference(val_images, roi_size, sw_batch_size, model, mode="gaussian", overlap=overlap_ratio)
for dims in flip_tta:
with torch.cuda.amp.autocast():
flip_pred = torch.flip(sliding_window_inference(torch.flip(val_images, dims=dims), roi_size, sw_batch_size, model, mode="gaussian", overlap=overlap_ratio), dims=dims)
pred += flip_pred
ct += 1.0
val_outputs = pred / ct
Great, thank you for the clarification @tangy5 . Really appreciate your support.
Great, thank you for the clarification @tangy5 . Really appreciate your support.
No problem. Thanks.
Hello,
Thank you for sharing your work and the codebase.
I am trying to reproduce the results of UNETR for the BTCV dataset mentioned in Table. 1 (Without using any additional data on the standard leaderboard). The number in the paper is 0.856. However, when I retrained UNETR on the training data of BTCV and submitted the Avg is 0.765. I am wondering if you could please share the testing or the submission scripts to be able to reproduce the same numbers of 0.856 without any additional data. Also, you release only the validation and the training transformation, what about the testing transformation for submission? Are they the same as validation transformation?
Thanks!