Problem about detector performance against hoi performance

fredzzhang / pvic

[ICCV'23] Official PyTorch implementation for paper "Exploring Predicate Visual Context in Detecting Human-Object Interactions"

BSD 3-Clause "New" or "Revised" License

67 stars 8 forks source link

Hi @shan-jianqing,

Thanks for taking an interest in our work.

What I suspect was happening is that the multi-level features in H-DETR are not exploited effectively. These more advanced detectors all use deformable transformers and five levels of image features. We did some explorations on these multi-level features in appendix B of the paper, but were unable to get any significant improvement. So it's possible that even though the object detection quality is better with H-DETR, the extracted visual features are not as discriminative.

Hope that helps.

Cheers, Fred.

fredzzhang / pvic

Problem about detector performance against hoi performance #57