dnbaker / dashing

Fast and accurate genomic distances using HyperLogLog
GNU General Public License v3.0
161 stars 11 forks source link

unique exact matches hll #58

Open lutfia95 opened 3 years ago

lutfia95 commented 3 years ago

Hi,...

i have a question about the unique exact matches, can I use the (./dashing hll) not for exact matches, I need to know the whole number of matches, not just the unique one. In my sensitivity is it important to check the whole number of matches, only the unique exact matches is not really useful in my experiment.

Thanks!

Cheers Ahmad

dnbaker commented 3 years ago

Hi Ahmad,

I'm happy to help, but I'm not quite sure exactly what you're looking for.

Are you looking for multiset similarity, where multiple instances of the same k-mer are counted multiple times? You can do this exactly with dashing <dist/sketch> --wj-exact [input files] or inexactly, using a count-mi sketch, via dashing <cmd> --wj. See the Streaming Weighted Jaccard portion of the usage.

On the other hand, you might be looking for exact k-mer counts/matches; in that case, you can replace the HLL with sorted hash sets via the --use-full-khash-sets option.

Thanks for asking, and I'm happy to help further.

Best,

Daniel