fredzzhang / pvic

[ICCV'23] Official PyTorch implementation for paper "Exploring Predicate Visual Context in Detecting Human-Object Interactions"
BSD 3-Clause "New" or "Revised" License
69 stars 8 forks source link

video data #58

Closed LHYUNMIN closed 3 weeks ago

LHYUNMIN commented 1 month ago

Certainly, performing inference on video data is possible??

The UPT README mentions that inference has been conducted on video data or similar content.

fredzzhang commented 1 month ago

Hi @LHYUNMIN,

The model can only process an image at a time. However, you can still run the model on the frames of a video individually and aggregate the results in the end. The demo video on the UPT homepage was produced exactly that way.

Hope that answers your question.

Fred.