bokulich-lab / nf-ducken

Workflow to process amplicon meta-analysis data, from NCBI accession IDs to taxonomic diversity metrics.
4 stars 2 forks source link

Incorporated Cutadapt for primer removal and binning #72

Closed lina-kim closed 12 months ago

lina-kim commented 12 months ago

Closes #51 and #62. Introduces Cutadapt to the workflow for both primer removal and binning. Though it may not be the most computationally efficient method, the workflow takes a single FASTQ input artifact and splits it into N artifacts depending on the N primers provided by the user.

As a result, this removes the former FASTQ split processes. I'm open to further discussion on whether we should bring that back for efficiency's sake.

lina-kim commented 12 months ago

Merging now. @ChristosMatzoros heads-up these changes are coming; unfortunately the Cutadapt stats will take some wrangling to integrate to MultiQC, so take your time. Currently QIIME 2 only outputs Cutadapt stats as a text-based log file, so we'll have to do some text scraping -- but it's not a priority for now.

If you're curious though, here's a toy example log: trimmed_stats.log

(These changes should've come in a fork but the branch I originally created for minor changes ballooned, whoops!)