[PyTorch/Segmentation/nnUNet/] Wording and adversitement

Hi there,

thank you so much for the great work on the medical segmentation example using nnU-Net. This implementation is much cleaner than ours and offers a lot of functionality on a well-maintained form factor.

However, the communication of what is presented in the repository could use some polishing. This repository is not a reimplementation of nnU-Net and should not be advertised as such unless you can prove segmentation performance parity on the full portfolio of datasets (at least the 10 datasets from the MSD, ideally more (especially including KiTS2019, ACDC, BraTS2020)). The situation I would like to avoid is that scientific publications are being written comparing their results with 'nnU-Net' from this implementation without there being guarantees in place that this implementation performs as well as the original one.

nnU-Net is much more than the architecture. It is a holistic framework for automatically generating/configuring segmentation pipelines. All (hyper)parameters in nnU-Net were selected carefully for robustness and segmentation performance across many datasets. Any deviations from nnU-Net's parameters (and this includes, for example, the optimizer or data augmentation) is a deviation from nnU-Net that needs to be justified and verified across many datasets. Thus, this repository rather provides an example for a segmentation pipeline that is inspired by nnU-Net and reimplements parts of nnU-Net, but it is not a reimplementation.

The spirit of nnU-Net is cross-dataset compatibility and out-of-the-box performance without the need to fiddle with hyperparameters. In light of this, picking a single (simple) dataset and then demonstrating how quickly it can be trained to saturation using a set of specifically optimized hyperparameters is missing the point a little bit :-) Yes nnU-Net training is slow (16h on A100 single GPU) but speed never was a priority in nnU-Net. Segmentation performance across all datasets is what it is about.

BraTS from MSD is not a good dataset to demonstrate any of this on (speed, accuracy). The datset is very simple and can be trained very quickly. Your results would be much more impressive if you were using KiTS2019, LiTS (Task03 form MSD) or BraTS2020 (ideally all datasets from the Nature Methods publication).

It is a bit confusing that you are citing accuracy for training performance. You surely meant Dice score?

It would be great if you could update your citation to our Nature Methods paper. This will increase the legitimacy of the method :-) Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2020). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 1-9.

Best,

Fabian

Hi @FabianIsensee, thank you for your feedback. We appreciate your time reviewing our implementation, and sharing your impressions with us. The goal of NVIDIA Deep Learning examples is to publish optimized reference implementations of the most popular models in the literature, as well as those that achieve state-of-the-art results in different domains. The success of nnU-Net in the latest years in many many competitions made us want to include it in our repository.

Since the paper wanted to replicate is this: https://arxiv.org/pdf/1809.10486.pdf---which includes the results on all tasks in the MSD---would it help then if we included the accuracy results (Dice score) on every dataset in MSD?

Last, we are happy to add the citation to the Nature Methods paper :smile: 👍

NVIDIA / DeepLearningExamples

[PyTorch/Segmentation/nnUNet/] Wording and adversitement #933