beowolx / rensa

High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datasets
MIT License
67 stars 4 forks source link

Implement canonical c-minhash #3

Open perklet opened 4 months ago

perklet commented 4 months ago

I'm curious what is the performance difference between c-minhash and r-minhash. Are there any plans to implement the original c-minhash in this package?

jianshu93 commented 2 months ago

Same question here. Initial permutation should be fast, it is essentially one permutation hashing according to my understanding?

Best,

Jianshu