As far as I know, videomamba is currently used for action classification in videos. I've tested it on my own dataset, and the results are very promising, with an accuracy of up to 97%. However, I'm interested in performing frame-by-frame action recognition using other methods, such as sliding windows, and applying it to unedited long video segments with multiple action labels, similar to the Breakfast dataset. It's crucial that I need recognition for each frame of the video, with each frame corresponding to a specific action label. Is there any code available to achieve this currently?
Thank you for your assistance!
As far as I know, videomamba is currently used for action classification in videos. I've tested it on my own dataset, and the results are very promising, with an accuracy of up to 97%. However, I'm interested in performing frame-by-frame action recognition using other methods, such as sliding windows, and applying it to unedited long video segments with multiple action labels, similar to the Breakfast dataset. It's crucial that I need recognition for each frame of the video, with each frame corresponding to a specific action label. Is there any code available to achieve this currently? Thank you for your assistance!