UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.41k stars 2.49k forks source link

query adapter native in training #3084

Open achibb opened 11 hours ago

achibb commented 11 hours ago

Hi there!

Now that using adapters works, does it make sense to include it that you can use an adapter for the query / and the sentence2 natively with model.train?

tomaarsen commented 11 hours ago

Hello!

That's kind of a cool setup - I'm more familiar with e.g. finetuning 2 adapters (one for queries, one for docs), but you can indeed also do 1 query adapter and just the normal SentenceTransformerTrainer for the documents. Just intuitively, you might get the best performance if you first finetune some model with your documents normally and then finetune an adapter on top of the document-finetuned model. Otherwise you have to make your adapter on the base model - that might do a bit worse.

Having said that, this is all just guesses. I'm still training PEFT models myself to get a feel for the performance & so I can write documentation. This is my sneak peek:

Here the PEFT model reached 0.4705 NDCG@10 on the NanoBEIR datasets, whereas the base model reached 0.4728 NDCG@10 on that same dataset. At the same time, the PEFT model requires a lot less memory during training. I still have to try and scale this up to a larger model.

In short; I can't really say right now - I'm not familiar enough with PEFT and embedding models. If you'd really like to know, perhaps you can ask the Jina folks. They've trained a few models with PEFT like https://huggingface.co/jinaai/jina-embeddings-v3.

achibb commented 9 hours ago

thanks as always for the help and the quick reply sounds great:

https://weaviate.io/papers/axn

here they explain a cool paper where they use a query adapter (paired with iterations and a cross encoder), to achieve cross encoder quality. Might be generally interesting :-)

tomaarsen commented 6 hours ago

I hadn't heard of that one yet - fascinating! Knowledge distillation is very powerful, we use it here as well for the MarginMSELoss: https://sbert.net/examples/training/ms_marco/README.html#marginmse

But our solution is quite a tad simpler, hah. I think their solution might not work out of the box, as there's (I don't think, anyways) a way right now to add an adapter, but only use that adapter for a portion of all inference (i.e. only the queries). It would require a custom loss/trainer/model I reckon.