Artanic30 / HOICLIP

CVPR 2023 Accepted Paper HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
55 stars 6 forks source link

Question about the verb class representation #5

Closed hangzhiyiwei closed 1 year ago

hangzhiyiwei commented 1 year ago

Hi, thank you for your nice work.

I have a question about the verb class representation. image

In the above figure, are the bounding boxes of the human and object obtained from ground truth? If so, does it mean you use the unseen HOI class image and their ground truth bounding boxes for training?

Thank you very much.

hangzhiyiwei commented 1 year ago

Hi, I only want to check for sure. Actually, the EoID paper uses the unseen HOI class image's GT Bboxes for training (End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation, AAAI2023).

Artanic30 commented 1 year ago

Hi, it has been a busy week for me. We extract all verb class representation under training data in default (fully supervised) HOI setting . So, it's true we use the unseen HOI class image's GT Bounding boxes for training.

hangzhiyiwei commented 1 year ago

Oh, I understand it. Thanks for your kind reply.