Closed shan-jianqing closed 1 month ago
Hi @shan-jianqing,
Thanks for taking an interest in our work.
What I suspect was happening is that the multi-level features in H-DETR are not exploited effectively. These more advanced detectors all use deformable transformers and five levels of image features. We did some explorations on these multi-level features in appendix B of the paper, but were unable to get any significant improvement. So it's possible that even though the object detection quality is better with H-DETR, the extracted visual features are not as discriminative.
Hope that helps.
Cheers, Fred.
Hello Fred, thanks for your great work.
I notice that in the paper, you say: "The performance of our method with H-DETR-R50 was surprisingly lower than that with DETR-R50, although H-DETR-R50 outperforms DETR-R50 significantly in terms of object detection mAP on HICO-DET." I have the same problem when experimenting on HOI with a different object detector. I wonder what do you think is the cause of this problem and how to solve it?
Thanks in advance.