Slow convergence with SGD linear evaluation

Hi!

I am running a linear evaluation right now on a simsiam network I've just trained. It's on a different repository. In contrast to the evaluation protocol you've written, I use another one preferred by a few other papers: 256bs, 100 epoch, SGD with momentum, 0.3 lr, 0 weight decay.

My first intuition was that my code had a bug, because even when I used the weights you shared in this repository, my evaluation started off with 5% accuracy after the first epoch, which is somewhat close to random weights' performance. Now as few epochs passed I see some progress, maybe I will have 30%+ after 10 epochs. However, other self-supervised methods kick off this evaluation with 60% right after the first epoch.

Do you have any guesses why I experience low convergence with simsiam?

Thank you.

facebookresearch / simsiam

Slow convergence with SGD linear evaluation #37