Arcadia-Science / seqqc

A Nextflow pipeline to identify quality control issues with new sequencing data.
MIT License
28 stars 0 forks source link

Turn the nf-core modules into a workflow: sourmash sketch & sourmash gather #4

Closed taylorreiter closed 2 years ago

taylorreiter commented 2 years ago

This PR integrates the sourmash sketch and sourmash gather modules into a DSL2 workflow, tacking on after fastqc and multiqc. I use the nf-core sketch and gather modules, and create my own module to download the GTDB reps and human (k = 21) databases/signatures. I hard coded this for now because these should be our defaults, but in the future it could be cool to parameterize the whole workflow on ksize if there's a need for that.

One thing I haven't figured out is how to make the contamination database small for test runs. Right now, I have it running against a human genome signature and all of GTDB reps. This is way to big for a test run. The test should just run against human.

PR checklist

taylorreiter commented 2 years ago

I'm going to close this PR because I implemented this work over in #8, but re-arranged a bunch of branches and stuff.