facebookresearch / moco

PyTorch implementation of MoCo: https://arxiv.org/abs/1911.05722
MIT License
4.83k stars 794 forks source link

Loss stuck at ~6.90 #12

Closed bobi461993 closed 4 years ago

bobi461993 commented 4 years ago

I am trying to train MoCo V2 on a machine with 2 GPUs using the hyperparameters recommended in this repo. However, the loss function gets stuck at value 6.90-ish. Is this behaviour normal or should I try with a different set of hyperparameters? I see that you have used a machine with 8 GPUs. Could this explain the difference? Thanks!

amsword commented 4 years ago

i had similar loss values, and the fine-tuned performance is 66.9 (67.5 as shown in README). Thus, 6.9 loss value looks not that bad.

bobi461993 commented 4 years ago

Thank you for the prompt response!

KaimingHe commented 4 years ago

I suggest you finish training and check the final result. The loss should be decreasing if you monitor for a longer time. 6.9 is not any special number here, as the random guess is log(65536), not log(1000).