OpenGVLab / VideoMAEv2

[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
https://arxiv.org/abs/2303.16727
MIT License
445 stars 45 forks source link

Pretrained smaller models availability #45

Closed ganzobtn closed 3 months ago

ganzobtn commented 7 months ago

Hello. Thank you for the great work.

  1. Could you provide me with the ViT-B adn ViT-S model?
  2. How much GPU VRAM required when I fine-tune pretrained ViT-G model on custom video dataset? When I try to finetune it with batch size of 1 on V100 with 32GB memory, it is showing CUDA out of memory error. Is there sth wrong with what I am doing?
congee524 commented 7 months ago
  1. vit_b_hybrid_pt_800e.pth
  2. we fine-tune vit-g with batch_size=6 on 80G-A100. Kindly check your pytorch version (the higher, the better) or you could use checkpointing.

hope it helps.