UKPLab / gpl

Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577
Apache License 2.0
324 stars 37 forks source link

GPL with low performant CE #35

Open IliasAarab opened 1 year ago

IliasAarab commented 1 year ago

Does it make sense to train a model using GPL, when the CE used for pseudo labelling is a bad performer on the domain dataset (i.e. when using the CE directly for IR tasks on the domain dataset, the results are poor)? I would think the GPL trained model would also be a poor performer as the CE performance represents the upperbound the GPL can achieve.

If my reasoning is correct, is there a way to deal with this shortcoming?