Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577
The previous code did not check whether the batch size is larger than the number of data points (or number of generated queries) in
PseudoLabeler.run
pl/toolkit/pl.py
: Added check at the beginning ofrun
about batch size vs data sizetests/unit/test_pl.py
: Added test