facebookresearch / SlowFast

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
Apache License 2.0
6.38k stars 1.18k forks source link

Maskfeat confusion #673

Open smandava98 opened 9 months ago

smandava98 commented 9 months ago

Hi. I have a question about MaskFeat. Does it take all video frames into GPU memory or frame by frame at a time? Can I input arbitrary video frames?

alpargun commented 4 months ago

Yes, MaskFeat paper originally aims video understanding, however, it can also be generalized for images, i.e., a singe frame, as mentioned in the abstract of the paper:

MaskFeat further generalizes to image input, which can be interpreted as a video with a single frame and obtains competitive results on ImageNet.

How many frames will be processed by the GPU, hence, depends on the hyperparameter NUM_FRAMES. So, you can change the number of input frames.