dnbaker / dashing

Fast and accurate genomic distances using HyperLogLog
GNU General Public License v3.0
160 stars 11 forks source link

Minimum k-mer abundance threshold is undocumented & omitted from usage #16

Closed bede closed 5 years ago

bede commented 5 years ago

This feature is mentioned in this very helpful reply to a now-closed issue, but is completely undocumented as far as I can tell.

Usage of dashing sketch -n 5 does not throw an error but also gives no feedback.

Application of a minimum abundance threshold is a very useful feature.

dnbaker commented 5 years ago

Hi! Thank you for pointing this out. I hadn't realized that it wasn't in the documentation.

I've added these in: https://github.com/dnbaker/dashing/commit/4aac0988af3334c650e5d62432ccfded29915cf2. Unfortunately, because of overlapping functionality, the usage menus for the separate subcommands don't entirely match, but the usage menus are up to date.

bede commented 5 years ago

Thanks for implementing this so quickly! I still don't find the usage wording especially clear, but it's enough.

-n Provide minimum expected count for fastq data. Default: -1 (passing all reads)

dnbaker commented 5 years ago

Great feedback, as well. I've made this a little clearer in the usage. (-n: Provide minimum expected count for fastq data. If unspecified, all kmers are passed.) I'll close this for now, but feel free to reopen it if necessary.

Thanks again!