Closed Dongshengjiang closed 2 years ago
I believe you can just swap the '--arch' and it can train. The performance is often higher (1-2%).
For conv-stem no stop gradient is needed to make it stable. And perhaps it is not a good idea to stop gradient on the 4 additional layers in conv-stem.
I want to know how to train vit_conv_small and its performance. according to the paper, is necessary for stop gradient for the four convolution embedding?