UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.46k stars 2.5k forks source link

Hybrid Search with sentence transformers: easier approach #2340

Open avvRobertoAlma opened 1 year ago

avvRobertoAlma commented 1 year ago

I have a dataset of about 2 million sentences. Each sentence has some metadata (for example "authority":1 or "subject": "liability of directors"). My goal is to perform an hybrid search where the corpus is filtered (for example, the retrieved documents must have subject "liability of directors") and then the similiarity search based on cosine distance is executed. Which may be the simpler/best approach? I also tried a for loop where the whole corpus is scanned in each search and the similiarity search is performed only on the filtered items but i don't now if is a good approach.

rnckp commented 1 year ago

This might be worth looking into: https://weaviate.io/developers/weaviate/search/hybrid