LLaVA-VL / LLaVA-NeXT

Apache License 2.0
2.12k stars 140 forks source link

Request for NExTQA Dataset Evaluation Prompt and More Results on Challenging Datasets for Fair Comparison #3

Open patrick-tssn opened 3 months ago

patrick-tssn commented 3 months ago

To my knowledge, the videos in NExTQA dataset are relatively short, with an average video length of 44 seconds, and there is a noted static bias[1] in the ActivityNet QA dataset. Could you present further results on more demanding datasets for fair comparison, such as EgoSchema[2]? Additionally, Could I request that you supply the evaluation prompt for the NeXTQA dataset?

[1] Lei, Jie et al. “Revealing Single Frame Bias for Video-and-Language Learning.” ArXiv abs/2206.03428 (2022): n. pag. [2] Mangalam, Karttikeya et al. “EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding.” ArXiv abs/2308.09126 (2023): n. pag.

ZhangYuanhan-AI commented 3 months ago

Thanks for your advise. The evaluation on the EgoSchema is ongoing.

The prompt for the NeXTQA is: Answer the question using several words or phrase.'