Open ji-xin opened 1 year ago
Hi Ji,
Thanks for your interest in our work. Regarding your questions:
We do not have a way to anticipate how good a pretrained checkpoint will be, and we only know when we complete the pretraining and finetuning, and then evaluate on the test set of the downstream task. For the pretraining stage, we keep the hyperparameters to be exactly the same as the offshelf model (pretrained on bookwiki corpus), in order to have a fair comparision between the offshelf and self-pretrained versions.
There are a few studies which experiment with training transformer models on NLP tasks starting from random initialization, and documenting their performance. E.g. see: https://arxiv.org/pdf/2012.11995.pdf https://arxiv.org/pdf/2109.04953.pdf https://arxiv.org/pdf/2206.10139.pdf
Hello,
Thanks for the super interesting paper. I actually came across your poster in ACL and after reading the whole paper, I have a few questions regarding experimental details:
Thanks!