facebookresearch / swav

PyTorch implementation of SwAV https//arxiv.org/abs/2006.09882
Other
1.99k stars 280 forks source link

Question about loss and hyperparameters #69

Closed TopTea1 closed 3 years ago

TopTea1 commented 3 years ago

Hi, first, thanks for sharing your great work !

I'm trying to train my network using swav in 2x160 + 4x96 setting, I'm using the hyperparameters provided in the bs_256 script. The loss start to decrease, and seems to be stuck after 3-4epochs did I need to adapt the hyperparameters ? Or made some other changes ?

Thanks for your helps

Erfun76 commented 3 years ago

Hi, first, thanks for sharing your great work !

I'm trying to train my network using swav in 2x160 + 4x96 setting, I'm using the hyperparameters provided in the bs_256 script. The loss start to decrease, and seems to be stuck after 3-4epochs did I need to adapt the hyperparameters ? Or made some other changes ?

Thanks for your helps

Reduce queue length. I encountered this issue in some of my training and using a smaller queue solved the problem.

xcvil commented 3 years ago

I had the same issue. During my warm up training, the loss was not reduced anymore and kept the same value.

mathildecaron31 commented 3 years ago

Hi @TopTea1 Thanks for your interest and your kind words. As suggested by @Erfun76 I would suggest to reduce the queue length or starting the queue later on in training (i.e. --epoch_queue_starts 50 for example). Also feel free to take a look at this section for tips on how to get the model training https://github.com/facebookresearch/swav#common-issues

ibro45 commented 1 year ago

If I understood correctly, you still keep the same number of prototypes (3000) and batch size (64), while reducing the queue size from 3840 to, e.g., 50? How does that influence the equipartition constraint?