Closed norsedrunkensailor closed 5 months ago
Is this correct usage?
That shouldn't work @norsedrunkensailor, as Jaccard distance is a distance between sets, not continuous vectors. In our case, its implemented for bitsets. So you may want to compare values against ones and then call np.packbits
before passing to SimSIMD. Let me know of that helps 🤗
A yes, of course -- sorry. I was trying to implement a method proposed in https://link.springer.com/article/10.1007/s41060-017-0064-z which reduces the number of operations using the related Tanimoto Coeff and some bounding conditions. Using np.packbits works. Is there a way of monitoring progress for large (10^5 by 10^5 batches of all pairs similarity search)? to get any estimate of how long it will take? Thank you again 😁🍀
@norsedrunkensailor for progress tracking please check out the USearch library. It adds multithreading and custom logging functionality among other things 🤗
Will it be possible to extend simsimd.cdist to allow for Jaccard distances to be calculated in batch?