Semantic Search for the large corpora

UKPLab / sentence-transformers

State-of-the-Art Text Embeddings

https://www.sbert.net

Apache License 2.0

15.2k stars 2.47k forks source link

Open JoyeBright opened 3 years ago

JoyeBright commented 3 years ago

Greetings,

Just wanted to know what technique/ implementation you suggest for finding similar sentences over a very large corpus? 31M sentences, for instance!

Worth mentioning that my question is more concerned with memory management rather than speed.

Cheers, Javad

aquibjaved commented 3 years ago

Hey @JoyeBright

Now for similarity search you try the following options: If you want to implement from scratch you can use the following:

Production ready pipelines you can use:

nreimers commented 3 years ago