Asymmetric semantic search question

UKPLab / sentence-transformers

Multilingual Sentence & Image Embeddings with BERT

Apache License 2.0

14.33k stars 2.39k forks source link

I have a dataset of <context, query, score>, 30K triples. From the documentation, I understand that this comes under the rubric of Asymmetric semantic search with the context being a short passage.

As recommended I am planning to use MSMARCO trained with cosine sim as a base model.

Is it advisable to fine-tune using a Cross Encoder & CECorrelationEvalutor? The reason I am asking is I am wondering if adding a sequence classification head is better or just use the triples <context, query, score> in conjunction with say cosinesimilarityloss and play with the embedding space ?

Please advice

UKPLab / sentence-transformers

Asymmetric semantic search question #1495