how about the training results for vit_conv_?

facebookresearch / moco-v3

PyTorch implementation of MoCo v3 https//arxiv.org/abs/2104.02057

Other

1.2k stars 159 forks source link

how about the training results for vit_conv_? #18

Closed Dongshengjiang closed 2 years ago

Dongshengjiang commented 2 years ago

I want to know how to train vit_conv_small and its performance. according to the paper, is necessary for stop gradient for the four convolution embedding?

endernewton commented 2 years ago

I believe you can just swap the '--arch' and it can train. The performance is often higher (1-2%).

For conv-stem no stop gradient is needed to make it stable. And perhaps it is not a good idea to stop gradient on the 4 additional layers in conv-stem.