This line asks the model to learn absolute value of w, h from output. But NO supervision is applied. Besides, the 'output' tensor is used to learn the OFFSET of bbox (x, y, w, h).
So, I am wondering whether the model can learn width and height as expected?
I have some doubts on line https://github.com/IDEA-opensource/DAB-DETR/blob/main/models/DAB_DETR/transformer.py#L242 .
This line asks the model to learn absolute value of w, h from output. But NO supervision is applied. Besides, the 'output' tensor is used to learn the OFFSET of bbox (x, y, w, h).
So, I am wondering whether the model can learn width and height as expected?