Some problems about swav experiment

facebookresearch / swav

PyTorch implementation of SwAV https//arxiv.org/abs/2006.09882

Other

2.01k stars 280 forks source link

Closed mmgongnpu closed 3 years ago

mmgongnpu commented 4 years ago

I'd like to ask a few questions.

due to the limitation of GPU resources, I can only use a single GPU to run swav experiments. In this case, what needs to be adjusted in the setting of experimental parameters? Will the performance of the pre-training model decrease significantly?
How many instances are needed at least in order to get a relatively good pre-training effect?
With regard to the model superparameter args.nmb_prototypes, if the actual categories of the custom dataset are few (far less than 1k), is it necessary to make corresponding adjustments?
In line 371 of the file main_swav.py, why does args.world_size appear in the code but not in the pseudo-code in the article?

Thanks again for being able to open source this code. I am looking forward to your reply.

mathildecaron31 commented 4 years ago

Hi, thanks for your interest in this work.

You can run SwAV with only one gpu. You need to apply the linear scaling rule to adjust the learning rate to your actual batch size, or perform a grid search. I would suggest using a queue (see the paragraph https://github.com/facebookresearch/swav#training-gets-unstable-when-using-the-queue for help on how to adjust the queue parameters)
Are you asking for the size of the dataset required ? If so I don't know since I've only been experimenting with ImageNet.
Yes I would recommend adjusting the number of prototypes to your custom dataset. For example if your dataset has 10 classes, you can try using --nmb_prototypes 100
The world_size parameter appears in the codebase because this code implements distributed sinkhorn algorithm. In the paper I wrote code in the non distributed setting. In your case both are equivalent since world_size=1.

Hope that helps

mmgongnpu commented 4 years ago

Thank you very much for your detailed answers. Here are several additional questions.

The second problem above refers to the size of the dataset. Since only one GPU can be used, too many samples will make the training time too long. But I'm worried that if there are too few samples, the self-supervised training will not achieve good results.
When the number of GPUs drops to 1, do I need to adjust the length of the queue and batch size accordingly?
If there are identical images in the training samples, will it affect the training result?
Will the unbalanced number of different types of samples in the dataset have a great impact on the final training effect?

Looking forward to your reply. thanks.

mathildecaron31 commented 3 years ago

Hi @mmgongnpu, sorry for the delay.

I have not been experimenting with small datasets, but I'd assume the method to work well in this scenario if you adjust the hyperparameters (i.e. small number of prototypes, learning rate, ...).
I'd say that you should use the biggest batch size you manage to fit in one gpu. For the queue, I'd say use the longest queue that does not make the training diverge.
I'd assume it will.
I don't know.

mathildecaron31 commented 3 years ago

Closing for no activity. Feel free to reopen if needed.