VideoMamba is currently used for action classification in videos. There are several video tasks such as video segmentation, video prediction and so on. If we use Patch Embedding and Video Mamba as they are, and only replace NN Heads with task-specific heads, can the model perform other video tasks well? I would like to know your opinion. Thank you!
VideoMamba is currently used for action classification in videos. There are several video tasks such as video segmentation, video prediction and so on. If we use Patch Embedding and Video Mamba as they are, and only replace NN Heads with task-specific heads, can the model perform other video tasks well? I would like to know your opinion. Thank you!