facebookresearch / vissl

VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.
https://vissl.ai
MIT License
3.25k stars 331 forks source link

The difference between DATASET_NAMES, DATA_SOURCES, DATA_PATHS, and LABEL_SOURCES #473

Open sarmientoj24 opened 2 years ago

sarmientoj24 commented 2 years ago

I saw this config for training in VISSL. But I cannot determine the difference between these parameters. For example, what does the imagenet1k_folder look like, or have? What does disk_folder have and mean? Same as DATA_PATHS.

config.DATA.TRAIN.DATASET_NAMES=[imagenet1k_folder] \
config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
config.DATA.TRAIN.DATA_PATHS=["/path/to/my/imagenet/folder/train"] \
config.DATA.TRAIN.LABEL_SOURCES=[disk_folder]
iseessel commented 2 years ago

I would recommend looking at our docs: https://vissl.readthedocs.io/en/v0.1.6/vissl_modules/data.html?highlight=Data#using-data and some of our tutorials: https://vissl.ai/tutorials/Feature_Extraction_V0_1_6.

imagenet1k folder should have following structure:

imagenet_full_size
|_ train
|  |_ <n0......>
|  |  |_<im-1-name>.JPEG
|  |  |_...
|  |  |_<im-N-name>.JPEG
|  |_ ...
|  |_ <n1......>
|  |  |_<im-1-name>.JPEG
|  |  |_...
|  |  |_<im-M-name>.JPEG
|  |  |_...
|  |  |_...
|_ val
|  |_ <n0......>
|  |  |_<im-1-name>.JPEG
|  |  |_...
|  |  |_<im-N-name>.JPEG
|  |_ ...
|  |_ <n1......>
|  |  |_<im-1-name>.JPEG
|  |  |_...
|  |  |_<im-M-name>.JPEG
|  |  |_...
|  |  |_...