UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
14.84k stars 2.44k forks source link

Hybrid Search with sentence transformers: easier approach #2340

Open avvRobertoAlma opened 10 months ago

avvRobertoAlma commented 10 months ago

I have a dataset of about 2 million sentences. Each sentence has some metadata (for example "authority":1 or "subject": "liability of directors"). My goal is to perform an hybrid search where the corpus is filtered (for example, the retrieved documents must have subject "liability of directors") and then the similiarity search based on cosine distance is executed. Which may be the simpler/best approach? I also tried a for loop where the whole corpus is scanned in each search and the similiarity search is performed only on the filtered items but i don't now if is a good approach.

rnckp commented 10 months ago

This might be worth looking into: https://weaviate.io/developers/weaviate/search/hybrid