Arcadia-Science / metagenomics

A Nextflow workflow for QC, evaluation, and profiling of metagenomic samples using short- and long-read technologies
MIT License
37 stars 2 forks source link

Sourmash additions for processing reads and assemblies #46

Closed elizabethmcd closed 1 year ago

elizabethmcd commented 1 year ago

This PR starts to address #5 for processing QCed reads and assemblies with sourmash to ultimately: 1) Compare reads-v-reads, assembs-v-assembs, and reads-v-assembs 2) Get taxonomy breakdown of reads and assembs

This PR sets up the infrastructure with a sourmash_profiling subworkflow that can take it either reads or assemblies and runs through sketch and compare. Later PRs will add support for gather, taxonomy and comparing reads v assembs. I wanted to have this reviewed first to make sure everything is fine with the logic of how I've set this up.

The sourmash modules are in modules/local/nf-core-modified since I modified the version number of sourmash and include a seqtype for distinguishing between reads and assembs input so that carries through to the outfile file names.

This also includes some fixes in the nanopore workflow for making the channel names more clear, which came up in #45

elizabethmcd commented 1 year ago

I also would appreciate feedback on kmer size selection, for now I have it set to k21 but would like to parameterize this and have a sensible default @taylorreiter