andreasvc / roaringbitmap

Roaring Bitmap in Cython
http://roaringbitmap.readthedocs.io
GNU General Public License v2.0
79 stars 11 forks source link

`MultiRoaringBitmap.jaccard_dist` against a query coming from an external `RoaringBitmap` #32

Closed ljmartin closed 2 years ago

ljmartin commented 2 years ago

Hi rbm, Is it possible to calculate bulk Jaccard distances across a MultiRoaringBitmap where the query is not already within the MultiRoaringBitmap?

A straightforward way might be:

multi_rb = MultiRoaringBitmap(list_of_indices, filename='index')
rbm = RoaringBitmap([0,3,6])
jacs = [r.jaccard_dist(b) for b in multi_rb]

Perhaps there's not much overhead working directly in python, but I figured there might be a cleverer/faster way to do this. Thanks! Lewis

andreasvc commented 2 years ago

I added a method for this:

multi_rb.jaccar_dist_single(rbm)
ljmartin commented 2 years ago

cool, thanks a lot!