facebookresearch / detr

End-to-End Object Detection with Transformers
Apache License 2.0
13.08k stars 2.37k forks source link

I think there are some errors in the posted code #619

Open nothing7744 opened 6 months ago

nothing7744 commented 6 months ago

Instructions To Reproduce the 🐛 Bug:

  1. the dataset process boxes[:, 2:] += boxes[:, :2] boxes[:, 0::2].clamp(min=0, max=w) boxes[:, 1::2].clamp(min=0, max=h) we can easily see that the annotations of bounding box is format like (x,y,x,y)
  2. but when we use annotation to calculate loss loss_giou = 1 - torch.diag(box_ops.generalized_box_iou( box_ops.box_cxcywh_to_xyxy(src_boxes), box_ops.box_cxcywh_to_xyxy(target_boxes))) you use box_ops.box_cxcywh_to_xyxy(target_boxes)) that means the annotations of bounding box is format like (cx,cy,w,h),This is a serious contradiction, am I misinterpreting it or is there a real problem with the code.
nothing7744 commented 6 months ago

This one really confuses me a lot, so I'd love for someone to answer this question.

nothing7744 commented 6 months ago

I've actually posted this question before but it wasn't answered

nothing7744 commented 6 months ago

Still no answer to my question.

nothing7744 commented 6 months ago

Still no answer to my question today.

WrinkleXuan commented 6 months ago

hahaha you are so cute

jveitchmichaelis commented 3 months ago

This is handled by the transformation/augmentation pipeline, there is a function box_xyxy_to_cxcywh (in utils/box_ops) that does this, it's called by the Normalize function when the data are transformed before being passed to the model.

https://github.com/facebookresearch/detr/blob/3af9fa878e73b6894ce3596450a8d9b89d918ca9/datasets/transforms.py#L242

Similarly, in the DETR implementation in HuggingFace (which is somewhat copied from here), there is an additional step in the processor that calls normalize_annotation

https://github.com/huggingface/transformers/blob/0290ec19c901adc0f1230ebdccad11c40af026f5/src/transformers/models/detr/image_processing_detr.py#L184