Closed Rajesh-ParaxialTech closed 5 months ago
Dear @Rajesh-ParaxialTech ,
thank you for your interest in nnDetection.
1) Yes, the detection confidence threshold is varied to construct the FROC curve. We did not save the confidence thresholds , sorry.
2) Yes, the figures are calculated with TTA since it is the default setting in nnDetection.
3) Luna represents a certain edge case for nnDetection since it uses a cross-validation for evaluation while nnDetection was primarily developed for external test sets. On a normal external test set, nnDetection uses TTA and Model ensembling to get the best possible result (like nnU-Net). In these scenarios, TTA only provides a small performance improvement while model ensembling gives a substantial boost, probably due to Weighted Box Clustering. (Note: model ensembling was not used for LUNA)
Best, Michael
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale.
Dear @Rajesh-ParaxialTech ,
thank you for your interest in nnDetection.
- Yes, the detection confidence threshold is varied to construct the FROC curve. We did not save the confidence thresholds , sorry.
- Yes, the figures are calculated with TTA since it is the default setting in nnDetection.
- Luna represents a certain edge case for nnDetection since it uses a cross-validation for evaluation while nnDetection was primarily developed for external test sets. On a normal external test set, nnDetection uses TTA and Model ensembling to get the best possible result (like nnU-Net). In these scenarios, TTA only provides a small performance improvement while model ensembling gives a substantial boost, probably due to Weighted Box Clustering. (Note: model ensembling was not used for LUNA)
Best, Michael
Hello Michael
The dataset used for training were the LIDC-IDRI (which has around 1018 datasets) datasets and testing was conducted using the LUNA16 (which has around 888 datasets and is a subset of LIDC-IDRI) dataset right ? May I know the approximate breakup used for training and testing ?
Thanks Rajesh
Dear @Rajesh-ParaxialTech ,
no, we did not train on LIDC and test on LUNA - this would also not be a valid strategy since LUNA is a subset of LIDC (please refer to the respective papers to check for the filtering criterions).
nnDetetion is a self-configuring method, which means it will derive its configuration based on the underlying dataset, i.e. it needs to be trained on each new task - it is not a Foundation Model which trains a single network for all tasks.
This means that we have trained separate networks on both LIDC and on LUNA. We followed the same splits as the official LUNA challenge (and Deeplung repository) and the LIDC split is provided in the nnDetection repostory:https://github.com/MIC-DKFZ/nnDetection/tree/main/projects/Task012_LIDC .
Thank you for the nice nnDetection model.
In the figure above, (shown in https://github.com/MIC-DKFZ/nnDetection/blob/main/docs/results/source/v001/luna.png) was the detection confidence threshold used as the parameter to control the Sensitivity - FalsePositiveRate (FPR) tradeoff. What were the confidence thresholds used to arrive at each of the FPRs ?
Were these figures used with Test Time Augmentations (TTA) ? Were 8 (ie 7 additional) TTAs applied by mirroring along each axis ? Do you have a sense of the accuracy improvement in these curves with TTA ? Inference with TTA: 8 flips takes 8x the amount of time.
Thank you Rajesh