OpenGVLab / VideoMamba

[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
https://arxiv.org/abs/2403.06977
Apache License 2.0
842 stars 60 forks source link

Details on pretrained models for downstream tasks (K400). #90

Open LSanghyeok opened 2 months ago

LSanghyeok commented 2 months ago

Hi! It's an intriguing work! I have a question about K400 pre-trained models. Did you use ImageNet-1K pre-trained models to pre-train K400 datasets for downstream tasks (e.g., Breakfast, COIN, etc) or simply train K400 from scratch without image-pretrained models?

Thanks.

Andy1621 commented 2 months ago

HI! As claimed in the caption, we use ImageNet pretrained models for K400 by default. For the models marked with , we trained it from scratch with masked modeling.

image