OpenGVLab / VideoMamba

VideoMamba: State Space Model for Efficient Video Understanding
https://arxiv.org/abs/2403.06977
Apache License 2.0
660 stars 47 forks source link

The data type when evaluting the performance. #48

Closed jiazhou-garland closed 1 month ago

jiazhou-garland commented 1 month ago

Hi, I'm trying to test your pre-trained model on Imagenet, namely videomamba_tiny using the /VideoMamba/videomamba/image_sm/exp/videomamba_small/run224.sh with corresbonding checkpoints.

However, the evaluated result is only 3% on the Imagenet-val with 50,000 test images. I just wondering whether the following code changes led to poor performance?

  1. Set the amp default as False to ensure the code running correctly.
  2. The data type of model parameters is torch.float, then I set the image dtype as torch.float as well by simply adding the line 'images = images.type(torch.float)'.
Andy1621 commented 1 month ago

Hi! Can you provide your GPU and the environment?