coldmanck / VidHOI

Official implementation of "ST-HOI: A Spatial-Temporal Baseline for Human-Object Interaction Detection in Videos" (ACM ICMRW 2021)
https://dl.acm.org/doi/10.1145/3463944.3469097
Apache License 2.0
50 stars 12 forks source link

Ground-truth human-object pairs in Oracle mode in training and evaluation? #5

Closed nizhf closed 2 years ago

nizhf commented 2 years ago

Thanks for your great work. I have a question to your training and evaluation process. How do you deal with the human-object pairs that are not annotated in the ground-truth when you train and evaluate the model in Oracle mode? Do you only use the ground-truth pairs as input, or you consider all possible pairs? Thank you.

coldmanck commented 2 years ago

Hi @nizhf thanks for your interest in our work!

For training, no matter it's in Oracle or Detection mode we use only ground truth person trajectories and we consider all possible pairs. So let's say there're n person box in a sampled frame then we consider n(n-1) pairs. Kindly also refer to the following code snippet:

https://github.com/coldmanck/VidHOI/blob/e6b2eb0840313a6ed84aa80fdd6ea809e059d4a1/slowfast/models/head_helper.py#L279-L283

nizhf commented 2 years ago

Thank you for the explanation. That means for a human-object pair without ground-truth interaction annotation, the label is set to a 50-d zero-vector, is that correct?

coldmanck commented 2 years ago

@nizhf That's right. https://github.com/coldmanck/VidHOI/blob/b0b52b9b7364ff27b29b72da5b1de661d0fe56d2/slowfast/datasets/vidor_helper.py#L221-L225