This PR integrates the sourmash sketch and sourmash gather modules into a DSL2 workflow, tacking on after fastqc and multiqc. I use the nf-core sketch and gather modules, and create my own module to download the GTDB reps and human (k = 21) databases/signatures. I hard coded this for now because these should be our defaults, but in the future it could be cool to parameterize the whole workflow on ksize if there's a need for that.
One thing I haven't figured out is how to make the contamination database small for test runs. Right now, I have it running against a human genome signature and all of GTDB reps. This is way to big for a test run. The test should just run against human.
PR checklist
[x] This comment contains a description of changes (with reason).
[ ] If you've fixed a bug or added code that should be tested, add tests!
[ ] If you've added a new tool - have you followed the pipeline conventions in the contribution docs
[ ] Make sure your code lints (nf-core lint).
[x] Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
[ ] Usage Documentation in docs/usage.md is updated.
[ ] Output Documentation in docs/output.md is updated.
[ ] CHANGELOG.md is updated.
[ ] README.md is updated (including new tool citations and authors/contributors).
This PR integrates the sourmash sketch and sourmash gather modules into a DSL2 workflow, tacking on after fastqc and multiqc. I use the nf-core sketch and gather modules, and create my own module to download the GTDB reps and human (k = 21) databases/signatures. I hard coded this for now because these should be our defaults, but in the future it could be cool to parameterize the whole workflow on ksize if there's a need for that.
One thing I haven't figured out is how to make the contamination database small for test runs. Right now, I have it running against a human genome signature and all of GTDB reps. This is way to big for a test run. The test should just run against human.
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).