UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.2k stars 2.47k forks source link

Semantic Search for the large corpora #853

Open JoyeBright opened 3 years ago

JoyeBright commented 3 years ago

Greetings,

Just wanted to know what technique/ implementation you suggest for finding similar sentences over a very large corpus? 31M sentences, for instance!

Worth mentioning that my question is more concerned with memory management rather than speed.

Cheers, Javad

aquibjaved commented 3 years ago

Hey @JoyeBright

Now for similarity search you try the following options: If you want to implement from scratch you can use the following:

Production ready pipelines you can use:

nreimers commented 3 years ago

Have a look at ANN here: https://www.sbert.net/examples/applications/semantic-search/README.html#approximate-nearest-neighbor