Dataset Download • Website • NVS Benchmark Training Results • Data Preparation • License • Issues • BibTex
We have witnessed significant progress in deep learning-based 3D vision, ranging from neural radiance field (NeRF) based 3D representation learning to applications in novel view synthesis (NVS). However, existing scene-level datasets for deep learning-based 3D vision, limited to either synthetic environments or a narrow selection of real-world scenes, are quite insufficient. This insufficiency not only hinders a comprehensive benchmark of existing methods but also caps what could be explored in deep learning-based 3D analysis. To address this critical gap, we present DL3DV-10K, a large-scale scene dataset, featuring 51.2 million frames from 10,510 videos captured from 65 types of point-of-interest (POI) locations, covering both bounded and unbounded scenes, with different levels of reflection, transparency, and lighting. We conducted a comprehensive benchmark of recent NVS methods on DL3DV-10K, which revealed valuable insights for future research in NVS. In addition, we have obtained encouraging results in a pilot study to learn generalizable NeRF from DL3DV-10K, which manifests the necessity of a large-scale scene-level dataset to forge a path toward a foundation model for learning 3D representation.
We report the performances of the main STOA methods (2023 Fall) on our large-scale NVS benchmark. Here are the quantitative results. Please refer to our paper for more details (e.g. more quantitative and qualitative results.)
Performance on the benchmark. The error metric is calculated from the mean of 140 scenes on a scale factor of 4. Zip-NeRF uses the default batch size (65536) and Zip-NeRF* uses the identical batch size as other methods (4096). Note, the training time and memory usage may be different depending on various configurations.
A presents the density plot of PSNR and SSIM and their relationship on \benchmark~for each method. B describes the performance comparison by scene complexity. The text above the bar plot is the mean value of the methods on the attribute.
DL3DV-10K has more than 10K high-quality videos that cover diverse real-world scenes for 3D vision tasks.
We have formulated the following requirements as guidelines for recording high-quality scene-level videos:
We provide a preview page here. The preview page has a snapshot of each scene, its hash code and labels. Some of the missing labels should be updated soon.
[x] Free download sample videos (11 scenes)
[ ] Benchmark dataset release (140 scenes)
[X] 10K Full Dataset Release: The whole dataset is extremly large. Here are different versions for different needs.
Please go to the relevant huggingface dataset page and request the access. If you request the access, you automatically sign our term of use and license and can access the dataset. Note, the latest license is open to the usage of the dataset. But it is the user's responsibility to keep the use appropriately. The DL3DV organization disclaims any responsibility for the misuse, inappropriate use, or unethical application of the dataset by individuals or entities who download or access it. More details can be found in our license.
If you have enough space, you can use git to download a dataset from huggingface. See this link. 480P/960P versions should satisfies most needs.
If you do not have enough space, we further provide a download script here to download a subset. First make sure you have applied for the access (See above). To set up the environment for the script, call this in your python virtual environment:
pip install huggingface_hub tqdm pandas
The usage for the download.py:
usage: download.py [-h] --odir ODIR --subset {1K,2K,3K,4K,5K,6K,7K,8K,9K,10K} --resolution {4K,2K,960P,480P} --file_type {images+poses,video,colmap_cache} [--hash HASH]
[--clean_cache]
optional arguments:
-h, --help show this help message and exit
--odir ODIR output directory
--subset {1K,2K,3K,4K,5K,6K,7K,8K,9K,10K}
The subset of the benchmark to download
--resolution {4K,2K,960P,480P}
The resolution to donwnload
--file_type {images+poses,video,colmap_cache}
The file type to download
--hash HASH If set subset=hash, this is the hash code of the scene to download
--clean_cache If set, will clean the huggingface cache to save space
Here are some examples:
# Make sure you have applied for the access.
# Use this to download the download.py script
wget https://raw.githubusercontent.com/DL3DV-10K/Dataset/main/scripts/download.py
# Download 480P resolution images and poses, 0~1K subset, output to DL3DV-10K directory
python download.py --odir DL3DV-10K --subset 1K --resolution 480P --file_type images+poses --clean_cache
# Download 960P resolution images and poses, 0~1K subset, output to DL3DV-10K directory
python download.py --odir DL3DV-10K --subset 1K --resolution 960P --file_type images+poses --clean_cache
# Download 2K resolution images and poses, 0~1K subset, output to DL3DV-10K directory
python download.py --odir DL3DV-10K --subset 1K --resolution 2K --file_type images+poses --clean_cache
# Download 4K resolution images and poses, 0~1K subset, output to DL3DV-10K directory
python download.py --odir DL3DV-10K --subset 1K --resolution 4K --file_type images+poses --clean_cache
# Download 4K resolution videos, 0~1K subset, output to DL3DV-10K directory
python download.py --odir DL3DV-10K --subset 1K --resolution 4K --file_type video --clean_cache
# Download 480P resolution images and poses, 1K~2K subset, output to DL3DV-10K directory
python download.py --odir DL3DV-10K --subset 2K --resolution 480P --file_type images+poses --clean_cache
DL3DV-10K is released under the DL3DV-10K Terms of Use. The DL3DV-10K Terms of Use, disclaimer, and the copy of the license are available in this repository.
Copyright (c) 2024
Despite our best efforts to anonymize data, there may be instances where sensitive details are inadvertently included. If you identify any such issues within the dataset (scenes), don't hesitate to get in touch with us at issue. We will manually redact any sensitive information to ensure the privacy and integrity of the dataset.
Want to contribute the DL3DV-10K dataset? Upload your video here.
The DL3DV-10K team is a non-profit organization with members inlcuding the authors of DL3DV-10K paper and volunteers who contribute to the dataset. Our mission is to make large-scale of deep learning models and datasets available to the general public.
If you found this dataset useful, please cite our paper.
@inproceedings{ling2024dl3dv,
title={Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision},
author={Ling, Lu and Sheng, Yichen and Tu, Zhi and Zhao, Wentian and Xin, Cheng and Wan, Kun and Yu, Lantao and Guo, Qianyu and Yu, Zixun and Lu, Yawen and others},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={22160--22169},
year={2024}
}