facebookresearch / vissl

VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.
https://vissl.ai
MIT License
3.25k stars 331 forks source link

Questions about labels in SSL and VISSL #472

Open sarmientoj24 opened 2 years ago

sarmientoj24 commented 2 years ago

I would be pretraining a ResNet-50 backbone using VISSL for object detectionI. I saw from the custom dataset that you have to incorporate labels within your data.

Questions:

  1. For a downstream task of object detection, what do these labels entail?
  2. My intuition with SSL is that there are no labels required that's why it is SSL. But what are these labels for?
  3. And what is the lbl_path format? Is it a txt file that points to the labels for each img?

Labels in VISSL

"imagenet1k_folder": {
        "train": ["<img_path>", "<lbl_path>"],
        "val": ["<img_path>", "<lbl_path>"]
    },

and this. The n0, n1, ... are labels, right?

imagenet_full_size
|_ train
|  |_ <n0......>
|  |  |_<im-1-name>.JPEG
|  |  |_...
|  |  |_<im-N-name>.JPEG
|  |_ ...
|  |_ <n1......>
|  |  |_<im-1-name>.JPEG
|  |  |_...
|  |  |_<im-M-name>.JPEG
|  |  |_...
|  |  |_...
|_ val
|  |_ <n0......>
|  |  |_<im-1-name>.JPEG
|  |  |_...
|  |  |_<im-N-name>.JPEG
|  |_ ...
|  |_ <n1......>
|  |  |_<im-1-name>.JPEG
|  |  |_...
|  |  |_<im-M-name>.JPEG
|  |  |_...
|  |  |_...
iseessel commented 2 years ago

The

1. For a downstream task of object detection, what do these labels entail?

We use Detectron2 (https://github.com/facebookresearch/detectron2) for object detection. I'd recommend taking a look at their repo/docs in addition to ours (https://vissl.readthedocs.io/en/v0.1.6/evaluations/object_detection.html?highlight=detection#benchmark-task-object-detection):

2. My intuition with SSL is that there are no labels required that's why it is SSL. But what are these labels for?

Hi @sarmientoj24 we support fine-tuning + linear evaluation in addition to SSL pre-training. The former requires labels, while the latter does not. You do not always have to incorporate the labels, for example you can have the following command for an SSL pretraining:

python3 tools/run_distributed_engines.py \
    config=test/integration_test/quick_swav.yaml
    config.DATA.TRAIN.DATA_SOURCES=[disk_filelist] \
    config.DATA.TRAIN.DATASET_NAMES=[imagenet1k_filelist] \
    config.DATA.TEST.DATA_SOURCES=[disk_filelist] \
    config.DATA.TEST.DATASET_NAMES=[imagenet1k_filelist] \

3. And what is the lbl_path format? Is it a txt file that points to the labels for each img?

I'd recommend looking at our documentation here: https://vissl.readthedocs.io/en/v0.1.6/vissl_modules/data.html?highlight=data#using-data and tutorial here: https://vissl.ai/tutorials/Benchmark_Linear_Image_Classification_on_ImageNet_1K_V0_1_6.

We have two main supported formats: disk_filelist (.npy file of ints or strings) or disk_folder (path to folder with structure as below).

imagenet_full_size
|_ train
|  |_ <n0......>
|  |  |_<im-1-name>.JPEG
|  |  |_...
|  |  |_<im-N-name>.JPEG
|  |_ ...
|  |_ <n1......>
|  |  |_<im-1-name>.JPEG
|  |  |_...
|  |  |_<im-M-name>.JPEG
|  |  |_...
|  |  |_...
|_ val
|  |_ <n0......>
|  |  |_<im-1-name>.JPEG
|  |  |_...
|  |  |_<im-N-name>.JPEG
|  |_ ...
|  |_ <n1......>
|  |  |_<im-1-name>.JPEG
|  |  |_...
|  |  |_<im-M-name>.JPEG
|  |  |_...
|  |  |_...

4. The n0, n1, ... are labels, right?

Yes these are labels.

sarmientoj24 commented 2 years ago

We have two main supported formats: disk_filelist (.npy file of ints or strings)

Are these List[str]? And is it a pickled numpy object saved through np.save(...)?

Is the format for imagenet1k_filelist the same as disk_filelist?

iseessel commented 2 years ago

Yes that's all correct. Filelists can either be List[str] or List[int]. For example:

images = ["/path/to/img_0", "/path/to/img_1"]
int_labels = [0, 0]
str_labels = ["label_0", "label_0"]
np.save("images.npy", np.array(images))
np.save("int_labels.npy", np.array(int_labels))
np.save("str_labels.npy", np.array(str_labels))