Open j93hahn opened 2 months ago
Hi @j93hahn,
Thanks for interest in our paper. And yes, we primarily focus on within-model-family weight transfer.
From a practical perspective, there is no good reason to initialize a much smaller CNN with a large pretrained ViT.
In openreview of our paper, we have conducted an initial experiment on cross-arch weight initialization (initialize ViT-T with isotropic ConvNeXt-S). But the performance gain is far lower than within-model-family initialization.
Please feel free to contact me by email in case you want to discuss details of your approach or idea.
Best, Oscar
Thanks for open-sourcing the code! I have a question - your paper seems to revolve around mono-architectural weight initialization. What if I want to use a very large pretrained ViT to initialize a much smaller CNN?
Using the weights alone doesn't seem as pertinent, especially since CNNs and ViTs do not carry the same inductive biases. Do you know of any papers exploring this direction?