dkoslicki / CMash

Fast and accurate set similarity estimation via containment min hash
BSD 3-Clause "New" or "Revised" License
42 stars 9 forks source link

Deal with the duplicate small k-mers in the sketches #7

Closed dkoslicki closed 6 years ago

dkoslicki commented 6 years ago

i.e. do something more intelligent than:

containment_indices[:, k_size_loc] = (hit_matrices[k_size_loc].sum(axis=1).ravel()/float(num_hashes))