Closed reisner closed 5 years ago
No plans for that. How big is the query? May be it makes sense to compute exact similarity between query and db? This is a matter of a one matrix multiplication.
пт, 16 авг. 2019 г., 20:52 Roman Eisner notifications@github.com:
From what I understand, the get_similar_pairs_cosine function finds pairs within a dataset. However, what if we are finding similarity between a query and the dataset? Would be nice to have this functionality here. Do you have plans to update this package, or know of others that do this?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dselivanov/LSHR/issues/17?email_source=notifications&email_token=ABHC5XKNEPVAMVHE2F2ON43QE3LM7A5CNFSM4IMLCDZKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HFWDLSQ, or mute the thread https://github.com/notifications/unsubscribe-auth/ABHC5XJ4BLUSULDW7FGRSDDQE3LM7ANCNFSM4IMLCDZA .
Yeah, it's just a single query, but a very large DB. I would like it to be quick, in realtime. Problem is the database is going to be 10s of millions of rows, and this starts to be slow at this point.
My suggestion is to benchmark. As I remember I had similar task - query against 20m book titles. And I've started to build LSH based retrieval. But at the end switched to brute force since it was faster.
пт, 16 авг. 2019 г., 20:59 Roman Eisner notifications@github.com:
Yeah, it's just a single query, but a very large DB. I would like it to be quick, in realtime. Problem is the database is going to be 10s of millions of rows, and this starts to be slow at this point.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dselivanov/LSHR/issues/17?email_source=notifications&email_token=ABHC5XOUAGVQE37BQCT44TTQE3MHDA5CNFSM4IMLCDZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4PEZXY#issuecomment-522079455, or mute the thread https://github.com/notifications/unsubscribe-auth/ABHC5XIJ2K7OUNSZSMT5RPDQE3MHDANCNFSM4IMLCDZA .
Oh, I did benchmark, that's why I'm here :) No problem if you're not planning on implementing this, thanks!
Is your db sparse or dense matrix?
dense matrix, it's a word embedding database.
That's easy then - take a look to rcppannoy or https://github.com/jlmelville/rcpphnsw
Thanks for that!
From what I understand, the
get_similar_pairs_cosine
function finds pairs within a dataset. However, what if we are finding similarity between a query and the dataset? Would be nice to have this functionality here. Do you have plans to update this package, or know of others that do this?