alirezazareian / ovr-cnn

A new framework for open-vocabulary object detection, based on maskrcnn-benchmark
MIT License
226 stars 28 forks source link

questions about bbox head output shape #13

Closed yestinl closed 2 years ago

yestinl commented 2 years ago

❓ Questions and Help

I wonder that why you keep the bbox head output dim equal to 8? But I notice when it calculates with "cls_agnostic_bbox_reg=True", you just forward the last 4 dim to computing loss. Why you directly set the output dimention of bboxHead be "4"?

alirezazareian commented 2 years ago

You are right, there is no reason for it to be 8 when we only need 4. But that part is not implemented by me. I simply used the original maskrcnn-benchmark implementation. I was confused too, but it's probably to maintain generality. Because in the multi-class case, there is one background class, for which bounding box coordinates are generated but then discarded. There is no harm in that except a little extra computation. But there is no advantage either.