MIC-DKFZ / nnDetection

nnDetection is a self-configuring framework for 3D (volumetric) medical object detection which can be applied to new data sets without manual intervention. It includes guides for 12 data sets that were used to develop and evaluate the performance of the proposed method.
Apache License 2.0
553 stars 97 forks source link

[Question] How can I add false positive reduction components? #68

Closed nengwp closed 2 years ago

nengwp commented 2 years ago

:question: Question

False positive reduction is very important for object detection, such as applying FPR on the LUNA16 dataset.

So can I add this component? So what should I do?

mibaumgartner commented 2 years ago

Dear @nengwp ,

nnDetection does not include the design of a false positive reduction stage since it was out of scope of the initial publication and I'm not sure how much performance can be gained by it due to two reasons: 1) I'm not sure if a FPR network can learn features which can not be learned during the training of the detection network (it doesn't contain new information) 2) Designing task specific FPR stages (e.g. using the Projection of the Proposals) is not suitable for the scope of nnDetection since we expect them to work on many datasets.

Of course, you could simply train your own FPR stage and use the predictions generated by nnDetection to train/inference with them.

Best, Michael

nengwp commented 2 years ago

Thank you very much for your quick reply

This request is made because I found a large number of false positive predictions in the inference analysis report.

A large number of false positives are included in iou_0.1_score_0.1, and a large number of false positives are also found in iou_0.5_score_0.5. iou_0.5_score_0.5 also brings an increase in false negatives.

Is it possible for me to improve the results without additional FPR components?

mibaumgartner commented 2 years ago

Hi @nengwp ,

1) the analysis reports are only intended for visual inspection purposes and are not really designed to give you a comprehensive impression of the performance. The model is not calibrated and the cutoff for the confidence score (or "probability" of the network) are chosen arbitrarily. Increasing the score threshold will always add more false negatives and reduce FP, this is equivalent to a normal classifier and an ROC evaluation. In order to gain a better intuition of the performance, I would suggest looking at the FROC curve and checking which working point is acceptable for your application (This usually depends on the underlying diagnostic goal, e.g. screening requires high sensitivity). Based on the validation set, choose the confidence score threshold to match your clinical need. In my experience, the FROC evaluation also gives a somewhat pessimistic estimate and when looking at the FP predictions most of them are corner cases which are still interesting to look at.

2) The current set of parameter worked well over the datasets we looked at in our paper, but certain components are still difficult to include in rules and a dataset specific finetuning can usually improve the results to a certain degree (e.g. some datasets are more heterogeneous than other and thus benefit from more augmentation). A task specific FPR implementation might be helpful but requires additional work and tuning.

Best, Michael

nengwp commented 2 years ago

Hi @mibaumgartner

Thank you very much for your answer, I will close this question.

Thibescobar commented 2 months ago

Based on the validation set, choose the confidence score threshold to match your clinical need. In my experience, the FROC evaluation also gives a somewhat pessimistic estimate and when looking at the FP predictions most of them are corner cases which are still interesting to look at.

Hello, thank you for this clear hints. Can you help me finding theses threshold so that I can choose one for deployment please. Thank you in advance, best, Thibault