Closed zhangchenghua123 closed 4 years ago
Yea. It is caused by the distinctions between given-GT-box extraction and direct extraction. In short, they are not using the same bounding boxes to get the features: Given-GT-box uses boxes after box regression while direct extraction uses boxes before box regression.
I illustrate the reason in the README file: https://github.com/airsplay/py-bottom-up-attention#proof-of-correctness
Since Bottom Up Attention only provides GT Boxes and features, I run the model it provides to get the object category and attribute category of GT Boxes.Here is an example. When I used the pre-training model you provided to extract the object categories and attribute categories of this image for given Gt-boxes, I found that some categories were different from those obtained in the original bottom up attention. This picture is from MSCOCO2014/VAL2014/COCO_val2014_00000039185.jpg. If you have time, you can verify it. I would like to know why the two are different.