MRHiSum / MR.HiSum

Other
26 stars 0 forks source link

Mr. HiSum: A Large-scale Dataset for Video Highlight Detection And Summarization

Mr. HiSum is a large-scale video highlight detection and summarization dataset, which contains 31,892 videos selected from YouTube-8M dataset and reliable frame importance score labels aggregated from 50,000+ users per video.

Most Replayed Statistics for Summarization

Example 1: AC Sparta Praha - Top 10 goals, season 2013/2014

1 2 3 4
gif1 gif2 gif3 gif4

The four most viewed scenes in the "AC Sparta Praha" video (Link) all show players scoring goals.

Example 2: Best Bicyle Kick Goals in Peru

1 2 3 4
gif1 gif2 gif3 gif4

The four most viewed scenes in the above video all show players scoring goals with amazing bicycle kicks.(Link)

Example 3: Neo - 'The One' | The Matrix

1 2 3
gif1 gif2 gif3

In the first most viewed scene, noted as 1 in the video, as soon as Neo meets Agent Smith, he is immediately shot by a gun. The second most viewed scene, noted as 2, plenty of Agent Smiths shoots Neo and Neo reaches out his hand to block the bullets. Lastly, in the most viewed scene 3, Neo engages in combat with Agent Smith. (Link)

Update


Getting Started

  1. Download the YouTube-8M dataset and place it under your dataset path. For example, when your dataset path is /data/dataset/, place your yt8m folder under the dataset path.

  2. Download mr_hisum.h5 and metadata.csv and place it under the dataset folder.

  3. Create a virtual environment using the following command:

    conda env create -f environment.yml
    conda activate mrhisum

Complete Mr.HiSum Dataset

You need four fields on your mr_hisum.h5 to prepare.

  1. features: Video frame features from the YouTube-8M dataset.
  2. gtscore: The Most replayed statistics normalized to a score of 0 to 1.
  3. change_points: Shot boundary information obtained using the Kernel Temporal Segmentation algorithm.
  4. gtsummary: Ground truth summary obtained by solving the 0/1 knapsack algorithm on shots.

We provide three fields, gtscore, change_points, and gtsummary, inside mr_hisum.h5.

After downloading the YouTube-8M dataset, you can add the features field using

python preprocess/preprocess.py --dataset_path <your_dataset_path>/yt8m

For example, when your dataset path is /data/dataset/, follow the command below.

python preprocess/preprocess.py --dataset_path /data/dataset/yt8m

Please read DATASET.md for more details about Mr.HiSum.


Baseline models on Mr.HiSum

We provide compatible code for three baselines models, PGL-SUM, VASNet, and SL-module.

You can train baseline models on Mr.HiSum from scratch using the following commands.

Furthermore, we provide trained checkpoints of each model for reproducibility.

Follow the command below to run inference on trained checkpoints.

python main.py --train False --model <model_type> --ckpt_path <checkpoint file path> --tag inference

For example, if you download the VASNet checkpoint and place it inside the dataset folder, you can use the command as follows.

python main.py --train False --model VASNet --ckpt_path dataset/vasnet1_best_f1.pkl --tag vasnet_inference

Train your summarization model on Mr.HiSum

We provide a sample code for training and evaluating summarization models on Mr.HiSum.

Summarization model developers can test their own model by implementing pytorch models under the networks folder.

We provide the SimpleMLP summarization model as a toy example.

You can train your model on Mr.HiSum dataset using the command below. Modify or add new configurations with your taste!

python main.py --train True --batch_size 8 --epochs 50 --tag exp1

License of Assets

This dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0) license following the YouTube-8M dataset. All the Mr.HiSum dataset users must comply with the YouTube Terms of Service and YouTube API Services Terms of Service.

This code referred to PGL-SUM, VASNet, and SL-module. Every part of the code from the original repository follows the corresponding license. Our license of the code can be found in LICENSE.


Citation

If you find this work useful in your research, please consider citing our paper:


@article{sul2024mr,
  title={Mr. HiSum: a large-scale dataset for video highlight detection and summarization},
  author={Sul, Jinhwan and Han, Jihoon and Lee, Joonseok},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  year={2024}
}