Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577
Apache License 2.0
315
stars
39
forks
source link
Guidance on gpl_stapes, new_size and batch_size_gpl #21
I am looking for some guidance on below parameters of gpl.train().
gpl_stapes - Do we need such a huge value of 140000 for corpus of size 1300?
new_size
batch_size_gpl - would it help to speed up the training if we keep this as 64 or 128?
How to derive the values of these parameters based on dataset or corpus.jsonl?
Hello,
I am looking for some guidance on below parameters of gpl.train().