Closed hangzhiyiwei closed 1 year ago
Hi, I only want to check for sure. Actually, the EoID paper uses the unseen HOI class image's GT Bboxes for training (End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation, AAAI2023).
Hi, it has been a busy week for me. We extract all verb class representation under training data in default (fully supervised) HOI setting . So, it's true we use the unseen HOI class image's GT Bounding boxes for training.
Oh, I understand it. Thanks for your kind reply.
Hi, thank you for your nice work.
I have a question about the verb class representation.
In the above figure, are the bounding boxes of the human and object obtained from ground truth? If so, does it mean you use the unseen HOI class image and their ground truth bounding boxes for training?
Thank you very much.