As mentioned in the README.md, the first stage of training the pre-trained model requires fixed ViT, and the second stage is training ViT in an end-to-end manner. My question is whether to load the model trained in the first stage during the second stage of training?
As mentioned in the README.md, the first stage of training the pre-trained model requires fixed ViT, and the second stage is training ViT in an end-to-end manner. My question is whether to load the model trained in the first stage during the second stage of training?