where are "EMA" decay and memory buffer used in this simclr work?

google-research / simclr

SimCLRv2 - Big Self-Supervised Models are Strong Semi-Supervised Learners

https://arxiv.org/abs/2006.10029

Apache License 2.0

4.07k stars 622 forks source link

where are "EMA" decay and memory buffer used in this simclr work? #167

Open fortisaqua opened 3 years ago

fortisaqua commented 3 years ago

By reading paper Big Self-Supervised Models are Strong Semi-Supervised Learners, I saw some MoCo feature like memory buffer and something like Momentum update parameter ,which is EMA decay rate. Can you tell me where did you use these parameters?

chentingpc commented 3 years ago

it's similar to moco, in our case it helps ~1% across. we do not integrate the code in this repo to avoid increasing the complexity.

Djoels commented 9 months ago

I was also interested in this question. The authors argue that due to the batch size being so large (4k), the memory bank only has very limited impact. However, for us poor people without SOTA GPU / compute setup, this could actually make a big difference between being able to train such a model or not being able to?

chentingpc commented 9 months ago

you may want to check out table 2 in https://proceedings.neurips.cc/paper/2021/file/628f16b29939d1b060af49f66ae0f7f8-Paper.pdf

Djoels commented 9 months ago

Thank you for that! It's a very helpful paper and will probably save us some GPU budget ^^ A question about it: I've been confused about the term "linear" in the SSL literature a bit. Does Linear evaluation accuracy indicate that the (SSL pre-trained) model wasn't fine tuned end-to-end but only the MLP layers that were added later on?