OpenGVLab / VideoMamba

[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
https://arxiv.org/abs/2403.06977
Apache License 2.0
814 stars 60 forks source link

RuntimeError: The size of tensor a (4) must match the size of tensor b (16) at non-singleton dimension 1 #71

Open FenggSuu opened 3 months ago

FenggSuu commented 3 months ago

ERROR when run run_videomamba_pretraining.py. File ".../VideoMamba/videomamba/video_sm/models/videomamba_pretrain.py", line 433, in forward x_clip_vis = self.forward_features(x, mask) File ".../VideoMamba/videomamba/video_sm/models/videomamba_pretrain.py", line 383, in forward_features x = x + self.temporal_pos_embedding RuntimeError: The size of tensor a (4) must match the size of tensor b (16) at non-singleton dimension 1

I printed the dimension of x and before the line of x = x + self.temporal_pos_embedding. print(x.size()) torch.Size([3136, 4, 576]) print(self.temporal_pos_embedding.size()) torch.Size([1, 16, 576])

Andy1621 commented 3 months ago

For different input frames, please interpolate the temporal_pos_embedding.