I have been working on taking VideoMamba's multi-modal pre-trained weights and applying it to other VQA downstream tasks/datasets. So far, I have been using the UMT_VideoMamba model as this is the model that is compatible with the provided pre-trained weights, but I stumbled upon the UMT_QA model which appears to be specifically tailored for VQA (i.e processes both questions and answers simultaneously + ranking) along with what appears to be outdated qa config files in the 'configs' folder. Before I spend time looking into this UMT_QA model, I just wanted to confirm whether it was ever used, or if it is deprecated. I was not able to find any references to the model in the repo besides initialization, so I am assuming it was just a research idea that ended up not being used. Thanks in advance for your help!
Hi! UMT is my previous paper, and this repo is built based on it. I have fine-tuned UMT for UMT_QA. For VideoMamba, you may need to change the code UMT_QA for UMT_VideoMamba_QA.
Hi,
I have been working on taking VideoMamba's multi-modal pre-trained weights and applying it to other VQA downstream tasks/datasets. So far, I have been using the UMT_VideoMamba model as this is the model that is compatible with the provided pre-trained weights, but I stumbled upon the UMT_QA model which appears to be specifically tailored for VQA (i.e processes both questions and answers simultaneously + ranking) along with what appears to be outdated qa config files in the 'configs' folder. Before I spend time looking into this UMT_QA model, I just wanted to confirm whether it was ever used, or if it is deprecated. I was not able to find any references to the model in the repo besides initialization, so I am assuming it was just a research idea that ended up not being used. Thanks in advance for your help!