Loss cardinality does not make sense in the setting when using focal loss

IDEA-Research / DINO

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"

Apache License 2.0

2.28k stars 260 forks source link

Hi,

I see that starting from deformable DETR, it doesn't make sense to have cardinality loss. In the original DETR paper, the class_embed head outputs num_classes + 1, to also output a value for the no object class, and if we keep the same structure then the Cardinality loss specifically the line would work as commented in the code. However, in all the versions of DETR from IDEA-Research that uses focal loss, uses class_embed head that only outputs num_classes, so no output for the no object class, and then the Cardinality loss instead considers the last class the no object class. Can you explain its use?

Also, why did you make the choice to not use the no-object class for focal loss, is there some explanation that I am missing?

IDEA-Research / DINO

Loss cardinality does not make sense in the setting when using focal loss #201