Open smandava98 opened 9 months ago
Yes, MaskFeat paper originally aims video understanding, however, it can also be generalized for images, i.e., a singe frame, as mentioned in the abstract of the paper:
MaskFeat further generalizes to image input, which can be interpreted as a video with a single frame and obtains competitive results on ImageNet.
How many frames will be processed by the GPU, hence, depends on the hyperparameter NUM_FRAMES. So, you can change the number of input frames.
Hi. I have a question about MaskFeat. Does it take all video frames into GPU memory or frame by frame at a time? Can I input arbitrary video frames?