OpenGVLab / VideoMamba

VideoMamba: State Space Model for Efficient Video Understanding
https://arxiv.org/abs/2403.06977
Apache License 2.0
660 stars 47 forks source link

How to Perform Frame-by-Frame Inference Using Videomamba #53

Closed franklio closed 1 month ago

franklio commented 1 month ago

As far as I know, videomamba is currently used for action classification in videos. I've tested it on my own dataset, and the results are very promising, with an accuracy of up to 97%. However, I'm interested in performing frame-by-frame action recognition using other methods, such as sliding windows, and applying it to unedited long video segments with multiple action labels, similar to the Breakfast dataset. It's crucial that I need recognition for each frame of the video, with each frame corresponding to a specific action label. Is there any code available to achieve this currently? Thank you for your assistance!

Andy1621 commented 1 month ago

Good try! However, I have not run VideoMamba in the frame-by-frame setting. Maybe you need to write a dataset with dense sampling by yourself.

franklio commented 1 month ago

Okay, I will try, thank you for your response.