fundamentalvision / Deformable-DETR

Deformable DETR: Deformable Transformers for End-to-End Object Detection.
Apache License 2.0
3.14k stars 513 forks source link

What do the four values ​​of pred_box mean? #217

Open small-code-cat opened 10 months ago

small-code-cat commented 10 months ago

Does anyone know what the four values of the box predicted by Deformable-DETR mean? Is it (cx, cy, w, h), or does it need to be decoded again?

JacobBITLABS commented 9 months ago

If you mean the outputs of the outputs['pred_boxes'], it is the center coordinate: (x,y) and (w,h), width and height of BBox. It is outputted as the relative values [0,1] of the input inference input size and need to be scaled to to absolute values depending on the size of inference input.