question about matching strategy for dense grounding truth situation ?

MohsenZand / ObjectBox

(ECCV 22 Oral) ObjectBox: From Centers to Boxes for Anchor-Free Object Detection

GNU General Public License v3.0

131 stars 16 forks source link

question about matching strategy for dense grounding truth situation ? #2

Closed anxu829 closed 1 year ago

anxu829 commented 1 year ago

Hello , thank you for sharing your code .

i think your simple and effective strategy is to assign the object center to the "cell" in different feature map where the center in , but i have several questions for your matching strategy

1) when we consider such situation like aerial object detection or scene text detection , where the target object is very dense , and their center may locate in to the same ‘cell’ in the smaller feature map (like stride = 32 feature map ) , i wonder if these objects may be assign to the same cell ? and at this situation , which object should your model to predict ?

2) by the way , when some of them may be locate in the same cell in smaller feature map (s=32) , they can be distinguished in the larger one (s=8) , any conflicts here for such situation ?

3) so i think when i use your model i can not use smaller feature map like p6, p7 like other one stage model , which may be slightly hurts the performance when i what to detect larger objects?

MohsenZand commented 1 year ago

Please see fig. 3 (b) and the following text on overlapping objects. Because the regression targets (L, R, T, B) are always positive, you can utilize strides of varying sizes. In very dense scenes with several object centers in the same cell, one technique would be to divide the cell into four'sub-cells,' with one prediction given to each of them. Consequently, the feature maps would be Bx4xSxS (it is Bx1xSxS in our method). This can even be utilized solely for larger strides.

anxu829 commented 1 year ago

thank your for your reply :) . so i suppose that this algorithmn is more suitable for the task that each 'bin' contain only one center , if their are multiple center in each bin , i need split it to 'sub-cells'(regression for subcell ) ?

so as i mentioned above , when we meet dense object in image , may be in p2-p4 , when feature map is dense , then centers can be split in each cell , when adding p5,p6 , some of then may gather in the same one . then what i need to do is to use a common head for each feature map (split all feature map grid to four'sub-cells,') or just add another head for the larger feature map (only split the larger one)

anxu829 commented 1 year ago

and anthor question is about the regression target , when i split the cell in four sub cell (blue one ) and i want to regress for the orange object , which delta_x / detal_y should be regressed?

MohsenZand commented 1 year ago

Yes, both scenarios are possible. You have the option of treating all feature maps equally (Bx4xSxS) or merely splitting the feature maps of the larger strides. Detection module must change accordingly.

Each sub-cell is treated as a single cell for regression targets (L, R, T, B). Therefore, figure (b) is correct, and the two arrows represent R and B.

anxu829 commented 1 year ago

ok , thanks for your reply , this work is very interesting 👍