Open CheukHinHoJerry opened 2 years ago
Hi and thank you for your interest. I've tried this and other strategies to further improve fine-tuning results, but there are some challenges. I'm doing some work to alleviate these challenges that will be soon presented at the ICML workshop https://pretraining.github.io/. Stay tuned!
Hi, please have a look at Pull request #7 which allows to improve fine-tuning results. Also, see the report on arXiv corresponding to this pull request where fine-tuning results and training curves obtained using this code are shown when trained up to 300 epochs.
Thanks a lot ! Will definitely take a look. Appreciated!
Hi, in the paper "Pretraining a Neural Network before Knowing Its Architecture", can you explain why the orthogonal re-initialization is not applied: "Furthermore, the orthogonal re-initialization step introduced next in Section 3.2 is not beneficial or applicable to some layers (e.g. first layers or batch normalization layers). "
Also in that paper, in PCA visualization in Figure 4, how can you get the vector representation for each architecture? I thought each operation in architecture has its encoding vector.
Hi, sorry for the late reply.
Thank you for your work and release the code generously.
In the google colab sample, the accuracy of the model was about 60%. I was thinking if we could continue to train the predicted model and achieve higher accuracy. Ideally it would be faster than training a new model from scratch.
Have you tried this before?