Customer coco dataset in self-training

facebookresearch / CutLER

Code release for "Cut and Learn for Unsupervised Object Detection and Instance Segmentation" and "VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation"

Other

913 stars 90 forks source link

Customer coco dataset in self-training #23

Closed tianyufang1958 closed 1 year ago

tianyufang1958 commented 1 year ago

Thanks for the nice work. I have a question regarding the customer coco dataset used in self-training. For my coco data, I have instances_train.py and instances_val.py, and I registered two datasets for both train and val, but in the first step of self-training, --test-dataset only take the 'imagenet_train'.

Does it mean Imagenet only use one json file for both train and validation? Or json file generation of self-training can only be applied to training data itself not val data. I am confused about it.

frank-xwang commented 1 year ago

Duplicate of https://github.com/facebookresearch/CutLER/issues/16. Please check https://github.com/facebookresearch/CutLER/issues/16 for more details on working with custom datasets.

About the self-training dataset, you can train CutLER on any dataset you specify. But you must let the model know which dataset/split to work on by changing the command accordingly.

tianyufang1958 commented 1 year ago

Duplicate of #16. Please check #16 for more details on working with custom datasets.

About the self-training dataset, you can train CutLER on any dataset you specify. But you must let the model know which dataset/split to work on by changing the command accordingly.

@frank-xwang Sorry maybe my question is not clear. i have split the dataset 80% and 20% in coco format and register as training and val dataset. For the command below, it is only for training dataset, should I also change to val dataset to generate pseudo labels as well? Just want to confirm this.

python maskcut.py \ --vit-arch base --patch-size 8 \ --tau 0.15 --fixed_size 480 --N 3 \ --num-folder-per-job 1000 --job-index 0 \ --dataset-path /path/to/dataset/traindir \ --out-dir /path/to/save/annotations \

frank-xwang commented 1 year ago

If you plan to use pseudo-masks for your validation dataset, then it is necessary to provide the path to the dataset that contains the validation split using the "--dataset-path" argument.

tianyufang1958 commented 1 year ago

If you plan to use pseudo-masks for your validation dataset, then it is necessary to provide the path to the dataset that contains the validation split using the "--dataset-path" argument.

@frank-xwang My understanding is firstly use the whole imaging dataset to generate the pseudo masks. After that the dataset can be splits into training and validation like 80% and 20% as the inputs of the phase 2 training. Could you please confirm if this is correct?

frank-xwang commented 1 year ago

No, for self-training, we still utilize 100% of the data. Our experimental setup is: using all ImageNet data as the training set and evaluates the model's performance on 11 different detection datasets to demonstrate zero-shot unsupervised learning.

frank-xwang commented 1 year ago

Closing it now, please feel free to reopen it if you have further questions.