Nicous20 / FunQA

FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, and beyond.
https://funqa-benchmark.github.io/
MIT License
96 stars 0 forks source link

FunQA: Towards Surprising Video Comprehension (ECCV 2024)

paper page Dataset Youtube Bilibili

https://github.com/Jingkang50/FunQA/assets/17070708/2a03eb8b-dd2a-4eaf-b2a4-fcd36c56b54a

Welcome to FunQA's Codebase Repository!

This repo provides the code for evaluating your model's output (json file).

Introducing FunQA

The motivation for the FunQA is straightforward: Humans enjoy surprising videos, including funny clips, creative performances, or visual illusions. We aim to evaluate and empower AI models with similar capabilities.

FunQA is a VideoQA dataset to evaluate and enhance the model's video reasoning capability upon counter-intuitive videos, including humorous and funny viral videos from TikTok, creative performance from Kasou Taishou (欽ちゃん&香取慎吾の全日本仮装大賞), and magic videos from YouTube and TikTok.

We establish rigorous QA tasks designed to assess the model's capability in counter-intuitive timestamp localization, detailed video description, and reasoning around counter-intuitiveness. We also pose higher-level tasks, such as attributing a fitting and vivid title to the video, and scoring the video creativity.

In total, the FunQA benchmark consists of 312K free-text QA pairs derived from 4.3K video clips, spanning a total of 24 video hours. Extensive experiments with existing VideoQA models reveal significant performance gaps for the FunQA videos across spatial-temporal reasoning, visual-centered reasoning, and free-text generation.

Updates

Todo

  1. [x] Release the FunQA dataset and arXiv paper.
  2. [x] Release evaluation code.
  3. [ ] Release the FunQA Extended dataset.

Table of Contents

1 - FunQA Benchmark

1.1 - FunQA Main Tasks

FunQA comprises three subsets of surprising videos: 1) HumorQA, 2) CreativeQA, and 3) MagicQA. Each subset is associated with three common tasks: 1) counter-intuitive timestamp localization, 2) detailed video description, and 3) reasoning around counter-intuitiveness (see H1-3, C1-3, and M1-3). Furthermore, we offer higher-level tasks tailored for each video type, such as attributing a fitting and vivid title for HumorQA and CreativeQA (see H4, C4), etc. img.png

1.2 - FunQA Extended Tasks

FunQA Multi-choice Dataset

FunQA Multi-choice Dataset is prepared to provide training and testing for arbitrary models, in this dataset our QA pairs are in the form of multiple choice, the answer is a word, phrase, or short sentence, and the type of questions are all descriptions. FunQA_MC.png

FunQA Dialog Dataset

Most of the current LLMs are in the form of dialogues. To cater to their data input, we produced the FunQA Dialog dataset, in which we used GPT-3.5 to convert QA pairs into recursive dialogues with added context. img_1.png

2 - Data Preparation

Please download all the videos and annotation files from here.

For FunQA Dataset: there are four zip files:

For FunQA Multi-choice Dataset:

3 - Evaluation

cd FunQA
conda create -n funqa python=3.10

# install bleurt
git clone https://github.com/google-research/bleurt.git
cd bleurt
pip install .

# download recommended checkpoint for bleurt

wget https://storage.googleapis.com/bleurt-oss-21/BLEURT-20.zip .
unzip BLEURT-20.zip

pip install -r requirements.txt
conda activate funqa

Please move archive bleurt/bleurt to bleurt/ Then edit and run ./scripts/run_classic_eval.sh and ./scripts/run_gpt4_eval.sh for evalution.

Acknowledgement

This study is supported by the Ministry of Education, Singapore, under its MOE AcRF Tier 2 (MOE-T2EP20221- 0012), NTU NAP, and under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s).

If you're using FunQA in your research or applications, please cite using this BibTeX:

  @misc{xie2024funqasurprisingvideocomprehension,
        title={FunQA: Towards Surprising Video Comprehension}, 
        author={Binzhu Xie and Sicheng Zhang and Zitang Zhou and Bo Li and Yuanhan Zhang and Jack Hessel and Jingkang Yang and Ziwei Liu},
        year={2024},
        eprint={2306.14899},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        url={https://arxiv.org/abs/2306.14899}, 
  }

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Looking forward to your feedback and please raise any issues or questions here.