Karttikeya Mangalam, Raiymbek Akshulakov, Jitendra Malik
Berkeley AI Research, UC Berkeley
:globe_with_meridians: Webpage | :book: Paper | :movie_camera: Teaser Video | :microphone: 4-min Podcast | :speaking_head: [Overview Talk Video]() | :bar_chart: Statistics Dashboard| :crossed_swords: Kaggle
Click for the youtube teaser video
:exclamation: EgoSchema is 10x to 100x more difficult longer temporal reasoning than almost all other video datasets**.
:exclamation: Largest OSS Video-Language models with 7B+ parameters achieve QA accuracy of <33% (Random choice is 20%). Humans achieve ~76%.
:exclamation: Even web-scaled trained closed source models with 100B+ parameters achieve <40% accuracy, highlighting the massive latent gap in model capabilities for long-term video understanding.
**please see paper for precise operationalizations.
The most optimal way to download the dataset at the current moment.
kaggle competitions download -c egoschema-public
Fast. Supports Resuming over spotty internet.
uid_to_url.json
file from EgoSchema Google DriveThis file will be updated weekly as the links expire every 7 days. You will receive a warning if your file becomes outdated.
conda create -n egoschema_download python=3.8
conda activate egoschema_download
conda install tqdm simplejson requests
pip install moviepy
mkdir videos
python download.py
Note 1: This will retrieve the dataset and store it in the videos
directory. Video names correspond to the q_uid
key in the questions.json
file. If any files encounter issues during installation, rerun the download.py
script. If problems persist, links to the necessary files will be provided via Google Drive.
Note 2: If above is too slow, run python download_multiproc.py --p <number of processes>
instead for multi-processed downloading. Please be aware that this method might hit the rate limit under heavy load on wasabi servers. In that case please revert to download.py
Simpler. Requires stable internet connection.
While we release all the video and questions from EgoSchema, we release the correct answers to only 500 of the EgoSchema questions provided in the subset_answers.json
file intended for offline experimentation and performance tracking.
:loudspeaker: EgoSchema is intended for a 0-shot evaluation benchmark, hence the entire correct answer file will not be make public. To evaluate on the entire benchmark please submit the correct answer estimate as follows:
Option A: Public Kaggle leaderboard. The primary means of submitting the results.
egoschema-public
as competition name.
usage: kaggle competitions submit [-h] -f FILE_NAME -m MESSAGE [-q]
[competition]
required arguments: -f FILE_NAME, --file FILE_NAME File for upload (full path) -m MESSAGE, --message MESSAGE Message describing this submission
optional arguments: -h, --help show this help message and exit competition Competition URL suffix (use "kaggle competitions list" to show options) If empty, the default competition will be used (use "kaggle config set competition")" -q, --quiet Suppress printing information about the upload/download progress
**Option B (using our provided wrapper):**
No leaderboard, just a submission validation.
- **Step 1**: Prepare a JSON file that contains a dictionary structured as `{ <question uid> :<correct answer>` where `correct_answer : int[0 - 4]`.
- **Step 2**: Run `python validate.py --f <path_to_json_file>` to send the request to EgoSchema server,
**Option C (directly using CURL):**
No leaderboard, just a submission validation.
- `curl -X POST -H "Content-Type: application/json" -d @<path_to_json_file> https://validation-server.onrender.com/api/upload/`
**Returned Payload** will contain the Multiple-Choice Question-Answer accuracy in the following text format:
MCQ Accuracy for All of 5031 EgoSchema Questions MCQ Accuracy for Publicly eleased 500 EgoSchema Answers
:fireworks: *Coming Soon* : A public leaderboard of submitted model rankings on EgoSchema.
### 2. Reproducing model results from paper :
Please see `benchmarking/` for detailed description of each model separately.