acmi-lab / self-pretrain

2 stars 0 forks source link

Question about the paper #1

Open ji-xin opened 1 year ago

ji-xin commented 1 year ago

Hello,

Thanks for the super interesting paper. I actually came across your poster in ACL and after reading the whole paper, I have a few questions regarding experimental details:

  1. During pretraining, do you have a way to conveniently anticipate how good a pretrained checkpoint will be, without running through the full finetuning and evaluation? Do you perform any kind of HP tuning and/or early stopping for pretraining?
  2. For training with random initialization, are there other references of training similar models from scratch, so we can be confident that the models have been trained properly and have achieved its best accuracy?

Thanks!

kukrishna commented 1 year ago

Hi Ji,

Thanks for your interest in our work. Regarding your questions:

  1. We do not have a way to anticipate how good a pretrained checkpoint will be, and we only know when we complete the pretraining and finetuning, and then evaluate on the test set of the downstream task. For the pretraining stage, we keep the hyperparameters to be exactly the same as the offshelf model (pretrained on bookwiki corpus), in order to have a fair comparision between the offshelf and self-pretrained versions.

  2. There are a few studies which experiment with training transformer models on NLP tasks starting from random initialization, and documenting their performance. E.g. see: https://arxiv.org/pdf/2012.11995.pdf https://arxiv.org/pdf/2109.04953.pdf https://arxiv.org/pdf/2206.10139.pdf