dnbaker / dashing

Fast and accurate genomic distances using HyperLogLog
GNU General Public License v3.0
161 stars 11 forks source link

presketch: no issue question #87

Closed gaboentropy closed 2 years ago

gaboentropy commented 2 years ago

When using presketch, I assume I don't need to specify sketch sizes and k-mer lenghts again? They're taken from the presketched files?

dnbaker commented 2 years ago

Hi Gabriel,

Thanks for checking in. You're right - you don't need to specify sketch sizes. The sketch sizes are read from the files (though you need to be careful not to pass HLLs of different sizes).

K-mer lengths don't affect the sketching - they only affect distances that use the k-mer length to convert Jaccard similarity into a distance. So if you're using --mash-dist, --containment-dist, you'll need to specify the k-mer length there. But if you're emitting Jaccard (the default), this won't change with k.

Let me know if you have any more questions. Thanks!

Daniel

gaboentropy commented 2 years ago

Thanks Dan.