fredzzhang / pvic

[ICCV'23] Official PyTorch implementation for paper "Exploring Predicate Visual Context in Detecting Human-Object Interactions"
BSD 3-Clause "New" or "Revised" License
67 stars 8 forks source link

Problem about detector performance against hoi performance #57

Closed shan-jianqing closed 1 month ago

shan-jianqing commented 1 month ago

Hello Fred, thanks for your great work.

I notice that in the paper, you say: "The performance of our method with H-DETR-R50 was surprisingly lower than that with DETR-R50, although H-DETR-R50 outperforms DETR-R50 significantly in terms of object detection mAP on HICO-DET." I have the same problem when experimenting on HOI with a different object detector. I wonder what do you think is the cause of this problem and how to solve it?

Thanks in advance.

fredzzhang commented 1 month ago

Hi @shan-jianqing,

Thanks for taking an interest in our work.

What I suspect was happening is that the multi-level features in H-DETR are not exploited effectively. These more advanced detectors all use deformable transformers and five levels of image features. We did some explorations on these multi-level features in appendix B of the paper, but were unable to get any significant improvement. So it's possible that even though the object detection quality is better with H-DETR, the extracted visual features are not as discriminative.

Hope that helps.

Cheers, Fred.