MCG-NJU / VideoMAE

[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
https://arxiv.org/abs/2203.12602
Other
1.36k stars 135 forks source link

Mixup and Cutmix #34

Closed samrudhdhirangrej closed 2 years ago

samrudhdhirangrej commented 2 years ago

Hello,

Thank you for sharing your work. I am looking at the code for finetuning the model and trying to understand how to apply mixup and cutmix to videos. The train dataloader seems to provide a batch of size (B, C, T, H, W). However, the mixup function from timm requires a batch of size (B, C, H, W). I couldn't find the code for reshaping the batch before sending it to mixup function. Am I missing something? Should we reshape the batch from (B, C, T, H, W) to (B, C*T, H, W) or (B*T, C, H, W)? Which is the correct way?

Thank you.

yztongzhan commented 2 years ago

Hi @samrudhdhirangrej! Thanks for your suggestion. We have fixed this bug. Please refer to mixup.py for details.