facebookresearch / pysparnn

Approximate Nearest Neighbor Search for Sparse Data in Python!
Other
915 stars 146 forks source link

What is the impact of using dense vectors? #33

Open johann-petrak opened 3 years ago

johann-petrak commented 3 years ago

From a very quick test with a small index, this seems to work well with dense vectors (I tried d=300), but is there any specific impact of using dense vectors on performance for building or searching the index?

spencebeecher commented 3 years ago

Hi Johann, Thanks for the question. It should work just fine with dense vectors. The algorithm design choices are optimized towards sparse vector assumptions but it can work on dense vectors too. I would note that there are very good libraries for dense vector search that should beat this approach (fiass, annoy).

On Fri, Mar 5, 2021 at 9:15 AM Johann Petrak notifications@github.com wrote:

From a very quick test with a small index, this seems to work well with dense vectors (I tried d=300), but is there any specific impact of using dense vectors on performance for building or searching the index?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/facebookresearch/pysparnn/issues/33, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC4WXK7B7OAU7EYDZCHTG3TCDYPRANCNFSM4YVMKCGQ .

johann-petrak commented 3 years ago

Thank you!

Thanks also for pointing out those other libraries -- I already looked at annoy but unlike pysparnn, annoy does not seem to support adding to an index. At least this is not documented anywhere.

Fiass does look very interesting though! Is there a rough estimate for how pysparnn would compare to Fiass with regard to performance and precision/recall?

spencebeecher commented 3 years ago

If I recall correctly- FIASS has a hierarchical small world graph implementation that should do very well on speed, precision, and recall. That implementation should work on both dense and sparse vectors. Not sure on incremental updates to that specific implementation. I would check their benchmarks. For dense vectors I remember annoy being at least 2 x faster than pysparnn (I could be wrong) and I think fiass has benchmarks comparing to annoy.

I would expect that library to do significantly better than this one on speed and accuracy. Definitely worth trying out.

On Fri, Mar 5, 2021 at 10:02 AM Johann Petrak notifications@github.com wrote:

Thank you!

Thanks also for pointing out those other libraries -- I already looked at annoy but unlike pysparnn, annoy does not seem to support adding to an index. At least this is not documented anywhere.

Fiass does look very interesting though! Is there a rough estimate for how pysparnn would compare to Fiass with regard to performance and precision/recall?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/facebookresearch/pysparnn/issues/33#issuecomment-791513300, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC4WXOM4MLLTV65GTQSCKTTCD6DBANCNFSM4YVMKCGQ .