Closed taylorreiter closed 1 year ago
~I swapped around all of the branches to try and get the CI to run...but it's not running. which is frustrating.~ CI needed to be enabled in the repo setting. now good to go.
nf-core lint
overall result: Passed :white_check_mark: :warning:Posted for pipeline commit 20f54ea
+| ✅ 131 tests passed |+
#| ❔ 17 tests were ignored |#
!| ❗ 11 tests had warnings |!
This PR integrates the sourmash sketch and sourmash gather modules into a DSL2 workflow, tacking on after fastqc and multiqc. I use the nf-core sketch and gather modules, and create my own module to download the contamination database used by gather. I hard coded this for now because these should be our defaults, but in the future it could be cool to parameterize the whole workflow on ksize if there's a need for that.
~One thing I haven't figured out is how to make the contamination database small for test runs. Right now, I have it running against a human genome signature and all of GTDB reps. This is way to big for a test run. The test should just run against human.~ I solved this by actually just making a small contam db over here: https://github.com/Arcadia-Science/seqqc-build-contam-db
One unresolved thing from this PR -- right now, if there are no matches in the sample against the gather database, no csv is output. This could be ok, but I think it would be better to output an empty csv. I have an issue open here: https://github.com/sourmash-bio/sourmash/issues/2357. I think an empty csv would be better because then I could check its contents and explicitly write a report that summarizes that there is no contam.
This PR also does some cleanup that I found when writing this workflow -- sorry it makes it a bit scattered.
Some of the recommended to do items aren't one yet bc the pipeline isn't advance enough. I think that's ok and have crossed them out below.
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.~docs/output.md
is updated.~CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).