Closed lucasjinreal closed 1 year ago
Thanks for providing the question. The reason might come from the focal loss for classification. We mentioned this in many DETR like models. We suggest using low thersholds in these models, like 0.2-0.3.
Setting a lower thresh might introduce some unwanted false positives, does there any hack on this? From practical of view, lower score caused too many issues in real world applications.
I think it is a common tradeoff between false positive and false negative results. Some other models may suffer from the problem as well.
In our experiments, with a lower threshold like 0.2-0.3, the predicted results are still meaningful. For example, the model might output some objects omitted in the ground truth. It rarely gives predictions that are totally wrong. A well-trained model causes nearly no real false positives. Hence, we think a lower threshold is acceptable.
One way to alleviate this is to rank the predictions category by category and then select predictions based on each category's mean and var of prediction scores. Some heuristic ways might be helpful as well.
I think it remains a good question for the community to explore, as the problem is not faced by the DAB-DETR only but all the focal loss based algorithms.
@SlongLiu thank u! focal loss indeed caused too lower scores.
I visualized some images using DAB_DETR resnet50 pretrained model:
so as for the score trending be lower, if a threshold set too low, will have many false postives, if set to high, easily miss important objects.
How to make the score confidence more reliable ?