LeiaInc / holopix50k

Holopix50k: A Large-Scale In-the-wild Stereo Image Dataset
Other
151 stars 11 forks source link

Holopix50k: A Large-Scale In-the-wild Stereo Image Dataset

Presented at CVPR 2020 Workshop on Computer Vision for Augmented and Virtual Reality

Project | Paper

Owen Hua, Puneet Kohli, Pritish Uplavikar *, Anand Ravi *, Saravana Gunaseelan, Jason Orozco, Edward Li
Leia Inc.
* Denotes equal contribution

With the mass-market adoption of dual-camera mobile phones, leveraging stereo information in computer vision has become increasingly important. Current state-of-the-art methods utilize learning-based algorithms, where the amount and quality of training samples heavily influence results. Existing stereo image datasets are limited either in size or subject variety. Hence, algorithms trained on such datasets do not generalize well to scenarios encountered in mobile photography. We present Holopix50k, a novel in-the-wild stereo image dataset, comprising 49,368 image pairs contributed by users of the Holopix™ mobile social platform.

Downloading the dataset

Linux/MacOS

In order to download the Holopix50k dataset, you will need to run the following command in a Python3 environment and need either wget or curl installed on you machine.

To download the complete dataset, run scripts/download_holopix50k.sh with the download path as follows:

./scripts/download_holopix50k.sh <DOWNLOAD_PATH>

You can also chose to download only the required dataset split by giving the following optional arguments to the script:

./scripts/download_holopix50k.sh <DOWNLOAD_PATH> [train|test|val]

The above commands will download the dataset at <DOWNLOAD_PATH>/Holopix50k.

Note that the script temporarily installs the gsutil tool to download the dataset. If you face issues installing gsutil, check out the official installation guide here.

Windows

To download the dataset on Windows, you will need Python installed on your machine. Once you have Python set up, download gsutil from here and extract the downloaded archive to some GSUTIL_ROOT directory (for example, C:\gsutil).

Now run the following command to download the complete Holopix50k dataset:

python [GSUTIL_ROOT]\gsutil -m cp -n -r gs://holopix50k-dataset/Holopix50k <DOWNLOAD_PATH>

If you want to download a particular SPLIT ("train", "test" or "val") of the Holopix50k dataset, change and run the above command as follows:

python [GSUTIL_ROOT]\gsutil -m cp -n -r gs://holopix50k-dataset/Holopix50k/[SPLIT] <DOWNLOAD_PATH>

If you face issues installing gsutil, follow the installation guide here.

Dataset size

Note that the size of the dataset you are able to download may vary from the original dataset size of 49,368 stereo images. Holopix50k is a crowd sourced dataset from Holopix social media platform. The original user (who posts the image on Holopix) retains the copyrights of the images they post as mentioned in our LICENSE. Hence, if a user deletes their image from Holopix, it is removed from our dataset and won't be available for download. This is similar to how other crowd sourced datasets operate (eg. WSVD).

Citation

If you use the Holopix50k dataset in your work, please cite our paper

@InProceedings{hua2020holopix50k,
author = {Yiwen Hua and Puneet Kohli and Pritish Uplavikar and Anand Ravi and Saravana Gunaseelan and Jason Orozco and Edward Li},
title = {Holopix50k: A Large-Scale In-the-wild Stereo Image Dataset},
booktitle = {CVPR Workshop on Computer Vision for Augmented and Virtual Reality, Seattle, WA, 2020.},
month = {June},
year = {2020}
}