Question about the paper

Hi Ji,

Thanks for your interest in our work. Regarding your questions:

We do not have a way to anticipate how good a pretrained checkpoint will be, and we only know when we complete the pretraining and finetuning, and then evaluate on the test set of the downstream task. For the pretraining stage, we keep the hyperparameters to be exactly the same as the offshelf model (pretrained on bookwiki corpus), in order to have a fair comparision between the offshelf and self-pretrained versions.
There are a few studies which experiment with training transformer models on NLP tasks starting from random initialization, and documenting their performance. E.g. see: https://arxiv.org/pdf/2012.11995.pdf https://arxiv.org/pdf/2109.04953.pdf https://arxiv.org/pdf/2206.10139.pdf

acmi-lab / self-pretrain