Open fortisaqua opened 3 years ago
it's similar to moco, in our case it helps ~1% across. we do not integrate the code in this repo to avoid increasing the complexity.
I was also interested in this question. The authors argue that due to the batch size being so large (4k), the memory bank only has very limited impact. However, for us poor people without SOTA GPU / compute setup, this could actually make a big difference between being able to train such a model or not being able to?
you may want to check out table 2 in https://proceedings.neurips.cc/paper/2021/file/628f16b29939d1b060af49f66ae0f7f8-Paper.pdf
Thank you for that! It's a very helpful paper and will probably save us some GPU budget ^^ A question about it: I've been confused about the term "linear" in the SSL literature a bit. Does Linear evaluation accuracy indicate that the (SSL pre-trained) model wasn't fine tuned end-to-end but only the MLP layers that were added later on?
By reading paper Big Self-Supervised Models are Strong Semi-Supervised Learners, I saw some MoCo feature like memory buffer and something like Momentum update parameter ,which is EMA decay rate. Can you tell me where did you use these parameters?