Closed LMerCy closed 2 years ago
Yeah it is interesting (and where the value of SimSiam resides in) that BYOL does not find it working in their setting removing EMA. We have both released code, so you can check the details and investigate what causes the difference in observations. Who knows it could be a next research paper -- so have fun!
Table19(fifth-to-last line) in byol seems have done same experiment to simsiam, drop negative samples, drop ema, use stop gradient, but it only get 5.5% in linear eval, what's the difference between this experiment and simsiam?