eladb3 / ORViT

"Object-Region Video Transformers”, Herzig et al., CVPR 2022
Apache License 2.0
42 stars 12 forks source link

How to do inference on new video? #7

Closed Jeba-create closed 2 years ago

Jeba-create commented 2 years ago

If I do inference, it works on the validation set and it generates the prediction accuracy. This is working fine, since we have the annotations for the validation set. But, for the test set or new input video, the detection has to be done before classification. Could you please help me with how to do inference for the new video where the detected bbox is unavailable?

Thanks in advance

eladb3 commented 2 years ago

Hi, You can use detector to extract boxes for your dataset. Detectron2 is great.

Jeba-create commented 2 years ago

Thank you so much for your suggestion. I would like to know, Is the code available here to do detection on the fly or I would do detection, store the results and then follow the same procedure as for validation?

eladb3 commented 2 years ago

Yes, you should store the detected bounding boxes, and then use them for ORViT

Jeba-create commented 2 years ago

One more suggestion please, Whether I should use "get_boxes_gt" or "get_boxes_detected" in the ssv2.py script to process on "bbox" related stuff.