Questions about small initial value of tau for EMA and usage of SGD optimizer instead of Adam optimizer

Guillem96 / data2vec-vision

PyTorch implementation of Data2Vec self-supervised approach for vision use cases.

GNU General Public License v3.0

18 stars 5 forks source link

Questions about small initial value of tau for EMA and usage of SGD optimizer instead of Adam optimizer #4

Closed Ruizhuo-Xu closed 1 year ago

Ruizhuo-Xu commented 1 year ago

I have some questions. Why is the initial value of the parameter tau for EMA set so small (0.1) in the code, while the paper uses (0.9998 or 0.9999)?

Furthermore, why is the code using SGD optimizer instead of the Adam optimizer mentioned in the paper? Are these differences due to the size of the training dataset?