Weight for QA benchmarks

RenShuhuai-Andy / TimeChat

[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

https://arxiv.org/abs/2312.02051

BSD 3-Clause "New" or "Revised" License

267 stars 23 forks source link

Weight for QA benchmarks #36

Closed NIneeeeeem closed 3 months ago

NIneeeeeem commented 3 months ago

Hi, there. Thanks for sharing your great work.

I wonder if the performance on the recently released QA benchmarks is zeroshot performance. Or rather, whether the original TimeChat weights have been imported or if there are new fine-tuned weights？

RenShuhuai-Andy commented 3 months ago

Hi, thanks for your interest.

We conduct zero-shot evaluation on the QA benchmarks using the original TimeChat weights.

NIneeeeeem commented 3 months ago

@RenShuhuai-Andy Thanks very mach for your reply, I just tested the performance of zeroshot and it does have good performance on open-set VQA.

However, there is no output that matches the form on benchmarks with multiple choice questions such as EgoScheme. This type of problem also occurs in many models that have not been fine-tuned for multiple choice questions, and I wonder if you have a good solution for this!

NIneeeeeem commented 3 months ago

Thank you for sharing your work! No problem now.