MCG-NJU / VideoMAE

[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
https://arxiv.org/abs/2203.12602
Other
1.32k stars 133 forks source link

videomae support frame size of 8 #58

Open zhudongwork opened 1 year ago

zhudongwork commented 1 year ago

The frame size seen from the experiment of the paper is 16 or 32, does videomae support frame size of 8?

TitaniumOne commented 1 year ago

The frame size seen from the experiment of the paper is 16 or 32, does videomae support frame size of 8?

The same problem for me, do you have any ideas?

My GPUs are limited,I need less frames(=8) and bigger batch size( > 16) as input. If input 8 frames, then the error comes like The size of tensor a (x) must match the size of tensor b (y) at non-singleton dimension at

x = self.patch_embed(x)
x = x + self.pos_embed.type_as(x).to(x.device).clone().detach()
JinChow commented 1 year ago

@TitaniumOne Yeah, I have the same problem, the error comes like The size of tensor a (x) must match the size of tensor b (y) at non-singleton dimension at x = self.patch_embed(x) x = x + self.pos_embed.type_as(x).to(x.device).clone().detach() Have you solve the problem? I will appreciate it if you can help me ! Thank you!