facebookresearch / AVT

Code release for ICCV 2021 paper "Anticipative Video Transformer"
Apache License 2.0
151 stars 28 forks source link

[Question] Video frame prediction without actions #24

Closed austinmw closed 2 years ago

austinmw commented 2 years ago

If I have a set of videos with no labeled actions, can I use this to just predict the next frame in a video?

rohitgirdhar commented 2 years ago

Hi @austinmw Apologies for the delay in responding. The model performs future prediction in the feature space. The features are then projected into a distribution over actions using a linear layer. In principle, that linear projection can be replaced with a decoder network to generate pixel values, however we did not experiment with that in this paper.