Mr. HiSum is a large-scale video highlight detection and summarization dataset, which contains 31,892 videos selected from YouTube-8M dataset and reliable frame importance score labels aggregated from 50,000+ users per video.
1 | 2 | 3 | 4 |
---|---|---|---|
The four most viewed scenes in the "AC Sparta Praha" video (Link) all show players scoring goals.
1 | 2 | 3 | 4 |
---|---|---|---|
The four most viewed scenes in the above video all show players scoring goals with amazing bicycle kicks.(Link)
1 | 2 | 3 |
---|---|---|
In the first most viewed scene, noted as 1 in the video, as soon as Neo meets Agent Smith, he is immediately shot by a gun. The second most viewed scene, noted as 2, plenty of Agent Smiths shoots Neo and Neo reaches out his hand to block the bullets. Lastly, in the most viewed scene 3, Neo engages in combat with Agent Smith. (Link)
Download the YouTube-8M dataset and place it under your dataset path. For example, when your dataset path is /data/dataset/
, place your yt8m
folder under the dataset path.
Download mr_hisum.h5 and metadata.csv and place it under the dataset
folder.
Create a virtual environment using the following command:
conda env create -f environment.yml
conda activate mrhisum
You need four fields on your mr_hisum.h5
to prepare.
features
: Video frame features from the YouTube-8M dataset.gtscore
: The Most replayed statistics normalized to a score of 0 to 1.change_points
: Shot boundary information obtained using the Kernel Temporal Segmentation algorithm.gtsummary
: Ground truth summary obtained by solving the 0/1 knapsack algorithm on shots.We provide three fields, gtscore
, change_points
, and gtsummary
, inside mr_hisum.h5
.
After downloading the YouTube-8M dataset, you can add the features
field using
python preprocess/preprocess.py --dataset_path <your_dataset_path>/yt8m
For example, when your dataset path is /data/dataset/
, follow the command below.
python preprocess/preprocess.py --dataset_path /data/dataset/yt8m
Please read DATASET.md for more details about Mr.HiSum.
We provide compatible code for three baselines models, PGL-SUM, VASNet, and SL-module.
You can train baseline models on Mr.HiSum from scratch using the following commands.
PGL-SUM
python main.py --train True --model PGL_SUM --batch_size 256 --epochs 200 --tag train_scratch
VASNet
python main.py --train True --model VASNet --batch_size 256 --epochs 200 --tag train_scratch
SL-module
python main.py --train True --model SL_module --batch_size 256 --lr 0.05 --epochs 200 --tag train_scratch
Furthermore, we provide trained checkpoints of each model for reproducibility.
Follow the command below to run inference on trained checkpoints.
python main.py --train False --model <model_type> --ckpt_path <checkpoint file path> --tag inference
For example, if you download the VASNet checkpoint and place it inside the dataset
folder, you can use the command as follows.
python main.py --train False --model VASNet --ckpt_path dataset/vasnet1_best_f1.pkl --tag vasnet_inference
We provide a sample code for training and evaluating summarization models on Mr.HiSum.
Summarization model developers can test their own model by implementing pytorch models under the networks
folder.
We provide the SimpleMLP
summarization model as a toy example.
You can train your model on Mr.HiSum dataset using the command below. Modify or add new configurations with your taste!
python main.py --train True --batch_size 8 --epochs 50 --tag exp1
This dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0) license following the YouTube-8M dataset. All the Mr.HiSum dataset users must comply with the YouTube Terms of Service and YouTube API Services Terms of Service.
This code referred to PGL-SUM, VASNet, and SL-module. Every part of the code from the original repository follows the corresponding license. Our license of the code can be found in LICENSE.
If you find this work useful in your research, please consider citing our paper:
@article{sul2024mr,
title={Mr. HiSum: a large-scale dataset for video highlight detection and summarization},
author={Sul, Jinhwan and Han, Jihoon and Lee, Joonseok},
journal={Advances in Neural Information Processing Systems},
volume={36},
year={2024}
}