Is the zero-shot performance of VideoQA in Table 20 from the model finetuned on ImageQA dataset?

X-PLUG / mPLUG-2

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)

Apache License 2.0

213 stars 17 forks source link

Open tgyy1995 opened 1 year ago

fuyu1998 commented 10 months ago

@tgyy1995 Have you found the answer yet? I have the same question.