UKPLab / sentence-transformers

Multilingual Sentence & Image Embeddings with BERT
https://www.SBERT.net
Apache License 2.0
14.37k stars 2.39k forks source link

Semantic Search for the large corpora #853

Open JoyeBright opened 3 years ago

JoyeBright commented 3 years ago

Greetings,

Just wanted to know what technique/ implementation you suggest for finding similar sentences over a very large corpus? 31M sentences, for instance!

Worth mentioning that my question is more concerned with memory management rather than speed.

Cheers, Javad

aquibjaved commented 3 years ago

Hey @JoyeBright

Now for similarity search you try the following options: If you want to implement from scratch you can use the following:

Production ready pipelines you can use:

nreimers commented 3 years ago

Have a look at ANN here: https://www.sbert.net/examples/applications/semantic-search/README.html#approximate-nearest-neighbor