SilentView / LVD-2M

[NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"
https://silentview.github.io/LVD-2M/
26 stars 2 forks source link

LVD-2M: A Long-take Video Dataset with Temporally Dense Captions

Official Github repository of
[LVD-2M: A Long-take Video Dataset with Temporally Dense Captions]()

Tianwei Xiong1,*, Yuqing Wang1,*, Daquan Zhou2,†, Zhijie Lin2, Jiashi Feng2, Xihui Liu1,✉

1The University of Hong Kong, 2ByteDance
*Equal contribution. †Project lead. Corresponding author.

NeurIPS 2024 Track Datasets and Benchmarks

arXiv Project Page

News

[2024/10/15] The dataset, the research paper and the project page are released!

Introduction

LVD-2M is a dataset featuring:

  1. long videos covering at least 10 seconds
  2. long-take videos without cuts
  3. large motion and diverse contents
  4. temporally dense captions.

Dataset Statistics

alt text

Dataset Access

Quick Walk Through for 100 Randomly Sampled Videos

We randomly sample 100 videos (Youtube source) from LVD-2M, users can download the videos and the annotation file.

We note that even a direct non-cherry picking random sample already presents decent quality.

We will remove the video samples from our dataset / demonstration if you find them inappropriate. Please contact xiongt20 at gmail dot com for the request.

File Downloading

We provide three splits of our video dataset according to their sources: Youtube, HDVG and WebVid.

You can download the three files from the links

The meta records should be put in the following paths:

Explanations for the Fields of the Meta Files:

Each row in the csv file corresponds to a video clip, the columns are:

Environment

conda create --name lvd2m python=3.9
conda activate lvd2m

# install ffmpeg
sudo apt-get install ffmpeg

pip install -r requirements.txt

Video Downloading Script

To download videos from a csv file, run the following command:

${PYTHON_PATH} \
download_videos_release.py \
--bsz=96 \
--resolution=720p \
--node_num=1 \
--node_id=0 \
--process_num=96 \
--workdir=cache/download_cache \
--out_dir="dataset/videos" \
--dataset_key="hdvg" \
--multiprocess

Your google accounts may be banned or suspended for too many requets. So you are suggested to use multiple accounts. Set the ACCOUNT_NUM in download_videos_release.py to specify.

Details for Video Downloading We don't provide the video data directly, instead we provide ways to download the videos from their original sources. Although HDVG dataset is also from youtube, its format is different from other youtube scraped datasets, so it is treated seperately. ### Technical suggestions for downloading videos from YouTube We use a modified version of [pytube](https://github.com/pytube/pytube) to download the videos. It supports downloading videos from youtube in a parallel, fast and stable way (using multiprocessing and multiple accounts). For more details, check the `download_videos_release.py` script. Overally, users are suggested to prepare multiple google accounts, run `python download_videos_release.py --reset_auth` for authorization and run the downloading scripts. We implemented the mechanism of dividing the request loads to multiple accounts. The processes launched on all the nodes will be evenly assigned to different accounts. *Note: the code for downloading videos from youtube could fail due to variation in youtube api behaviors, you can check the issues in [pytube](https://github.com/pytube/pytube) for updates.* ### Disclaimer about WebVid We **don't provide** code for downloading videos from **webvid** (whose videos are from stock footage providers) for two reasons: 1. Users can directly access these video clips through the provided urls, which is much simper than video clips from youtube. 2. To avoid possible violation of copyrights.

License

The video data is collected from publicly available resources. The license of this dataset is the same as License of HD-VILA.

Acknowledgements

Here we list the projects that inspired and helped us to build LVD-2M.

Citation

@article{xiong2024lvd2m,
      title={LVD-2M: A Long-take Video Dataset with Temporally Dense Captions}, 
      author={Tianwei Xiong and Yuqing Wang and Daquan Zhou and Zhijie Lin and Jiashi Feng and Xihui Liu},
      year={2024},
      journal={arXiv preprint arXiv:2410.10816}
}