Guidance on gpl_stapes, new_size and batch_size_gpl

UKPLab / gpl

Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577

Apache License 2.0

315 stars 39 forks source link

Guidance on gpl_stapes, new_size and batch_size_gpl #21

Open MyBruso opened 1 year ago

MyBruso commented 1 year ago

Hello,

I am looking for some guidance on below parameters of gpl.train().

gpl_stapes - Do we need such a huge value of 140000 for corpus of size 1300?
new_size
batch_size_gpl - would it help to speed up the training if we keep this as 64 or 128? How to derive the values of these parameters based on dataset or corpus.jsonl?

gabriead commented 1 year ago

Would also be interested in how to estimate the above values