Closed hanoonaR closed 2 years ago
Similar issue in another dataset (similar to BTCV), these two models are even worse than 3dunet.
Hi @hanoonaR
Thanks for the comments. The validation accuracy reported in the repo denotes for a single fold and by splitting the publicly available training data. For the leaderboard however, following the approach by previous SOTA for both models, we have used an ensemble of 20 models and extra private training data which increases the number of samples to 80 volumes. This is described in section 4.3 of the UNETR paper.
I hope this addresses the concerns.
Best,
Hi @lbf4616
Thanks for the comment. Based on our experience for both models, each task requires careful tuning to achieve optimal performance. For instance, we note that choice of optimizer, which is AdamW in our case, the learning rate scheduler and number of epochs could play an important role to achieve the best performance.
Overall, in most MSD tasks and BTCV, our model consistently outperformed unet or nnunet, if carefully tuned.
Best,
Hi, Thank you for sharing our wonderful work. Could you please provide some clarity on the evaluation accuracy on the BTCV validation dataset.
1) For UNETR when the pretrained models provided are evaluated on the BTCV validation set, the accuracy is 77.64. While, the testing accuracy reported in paper is 85.3. Is this huge gap expected and is it because of the difference in validation and test datasets?
2) For SWIN UNETR, when the pretrained models are evaluated on the BTCV validation set, the accuracy is 81.56. The table given in the Swin UNETR README report 81.86 as compared to 91.8 test accuracy reported in paper. Are the numbers in the table in README the validation accuracy?
Thank you.