UKPLab / gpl

Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577
Apache License 2.0
315 stars 39 forks source link

GPL for sentence embedding tasks? #22

Open hanshupe opened 1 year ago

hanshupe commented 1 year ago

In the provided examples GPL us used for semantic search tasks: given a query, relevant results should be retrieved. Is it also the recommended approach to get meaningful embeddings / bi-encoders, or is it better to use TSDAE?

rbbby commented 1 year ago

I was wondering the same thing, any update on this? Intuitively it should work, but it is unclear how it compares to solely using TSDAE or using a both methods.

artmatsak commented 1 year ago

I'm curious about this, too. We'd like to do domain adaptation for all-mpnet-base-v2.

gabriead commented 1 year ago

Any update on this?

rbbby commented 1 year ago

I went for TSDAE with finetuning on nli and sts corpora (there are scripts on the sentence transformer github that you can use). Seems to have worked great!