OpenGVLab / VideoMamba

VideoMamba: State Space Model for Efficient Video Understanding
https://arxiv.org/abs/2403.06977
Apache License 2.0
660 stars 47 forks source link

About SSv2 dataset #47

Closed lebron-2016 closed 1 month ago

lebron-2016 commented 2 months ago

Hello, I am a newbie in the field of video and I have encountered some problems when processing the SSV2 data set. I download the official data from the ssv2 data set official website. How should I organize the data for model training and evaluation?

I got this error when decompressing the data. How to solve it? image

Thanks!!

Andy1621 commented 2 months ago

Hi! Can you try to download the data from OpenDataLab?

lebron-2016 commented 2 months ago

Hi! Can you try to download the data from OpenDataLab?

OK!Thanks for your quick reply! Let me try it!!

lebron-2016 commented 1 month ago

Hi! Can you try to download the data from OpenDataLab?

Hi, I download the ssv2 data from OpenDataLab and hope to use the pre-training weights you provided for evaluation, but I encountered this error, image

The command I used is as follows. Is this the correct way to evaluate and test the model? image

The model weights I used are from here, image

Could you please help me figure out how to solve this problem? Apart from the relevant paths, I have not made any other modifications to the original code.

Thanks again!!

Andy1621 commented 1 month ago

Hi! Please try to print the shape of x and temporal_position_embedding.

lebron-2016 commented 1 month ago

Hi! Please try to print the shape of x and temporal_position_embedding.

Hi, the shape of x and temporal_position_embedding is here,

image

After transformation, the second data dimension of a batch is 20, but the corresponding dimension in self.temporal_pos_embedding is num_frames // kernel_size (16//2) = 8. How to solve this problem?

image image

By the way, there are some warning messages. I don’t know if they are normal.

image image

Andy1621 commented 1 month ago

Hi! Do you use all the commands in the script like here? It's strange that your temporal shape is 20, since I set it bu --num_frames. Besides, for temporal_pos_embedding , the kernel_size is equal to --tubelet_size (i.e., 1), thus it should be same as num_frames, which is also shown in the warning.

lebron-2016 commented 1 month ago

Hi! Do you use all the commands in the script like here? It's strange that your temporal shape is 20, since I set it bu --num_frames. Besides, for temporal_pos_embedding , the kernel_size is equal to --tubelet_size (i.e., 1), thus it should be same as num_frames, which is also shown in the warning.

Hi, I checked and found that there are indeed several items that are not aligned with your script. Now I can evaluate it smoothly. The results are as follows:

image

Thank you very much for your help!!