Closed yestinl closed 2 years ago
You are right, there is no reason for it to be 8 when we only need 4. But that part is not implemented by me. I simply used the original maskrcnn-benchmark implementation. I was confused too, but it's probably to maintain generality. Because in the multi-class case, there is one background class, for which bounding box coordinates are generated but then discarded. There is no harm in that except a little extra computation. But there is no advantage either.
❓ Questions and Help
I wonder that why you keep the bbox head output dim equal to 8? But I notice when it calculates with "cls_agnostic_bbox_reg=True", you just forward the last 4 dim to computing loss. Why you directly set the output dimention of bboxHead be "4"?