fundamentalvision / Deformable-DETR

Deformable DETR: Deformable Transformers for End-to-End Object Detection.
Apache License 2.0
3.22k stars 520 forks source link

fix linear projection for bounding box binary classification #148

Open NouamaneTazi opened 2 years ago

NouamaneTazi commented 2 years ago

The paper says:

The detection head is of a 3-layer FFN for bounding box regression, and a linear projection for bounding box binary classification (i.e., foreground and background)

Doesn't that mean we should only have 2 outputs in class_embed? (later used in here)

EDIT: after further investigation it seems that my confusion comes from this line . Why do we pick the best scoring bounding boxes based on the first class?

NouamaneTazi commented 2 years ago

Related to #79