Requesting to Add Video Evaluation Benchmark - VELOCITI

Hi there! Thanks for the effort to maintain this amazing repository.

This is a request to add our recent work on evaluation of Video Models. We propose an evaluation benchmark, VELOCITI.

Please find relevant details below,

Title:

VELOCITI: Can Video-Language Models Bind Semantic Concepts Through Time?

About To keep up with the rapid pace with which Video-Language Models (VLM) are being proposed, our primary motivation is to provide a benchmark to evaluate current SoTA, as well as upcoming VLMs on Compositionality, which is a fundamental aspect of vision- language understanding. This is achieved through carefully designed tests, which evaluate various aspects of perception and binding. With this, we aim to provide a more accurate gauge of VLM capabilities, encouraging research towards improving VLMs and preventing shortcomings that may percolate into the systems that rely on such models.

ArXiv https://arxiv.org/abs/2406.10889v1

GitHub https://github.com/katha-ai/VELOCITI

Project Page and Demo https://katha-ai.github.io/projects/velociti/

Please let me know if I missed some required details. Thanks for your time.

BradyFU / Awesome-Multimodal-Large-Language-Models

Requesting to Add Video Evaluation Benchmark - VELOCITI #166