bokulich-lab / nf-ducken

Workflow to process amplicon meta-analysis data, from NCBI accession IDs to taxonomic diversity metrics.
4 stars 2 forks source link

Incorporate primer-based binning prior to primer removal #51

Closed lina-kim closed 12 months ago

lina-kim commented 1 year ago

Original issue split primer trimming into #62. Cutadapt can't be used for binning as the QIIME 2 plugin requires an input artifact of type MultiplexedSingleEndBarcodeInSequence rather than the SampleData[PairedEndSequencesWithQuality] used for sequences downloaded with q2-fondue and typical of those downloaded from the SRA.

Instead, use VSEARCH to bin primers, as suggested on the QIIME 2 Forum. -> the alignment-only method of VSEARCH isn't wrapped in the QIIME 2 ecosystem!

To incorporate in two steps before denoising:

lina-kim commented 1 year ago

Might as well close. Makes sense to return to the original plan of custom artifact, split, followed by primer trimming. Unless I hear a strong case for binning?

lina-kim commented 1 year ago

Let's rethink this. The original one-step cutadapt binning method won't be possible without this feature or something along those lines.

If we could perform a simplified binning in a computationally efficient way, though, that would be great. The primary advantage is that we'd be able to run the workflow without requiring extensive inputs of the user: attaching a primer name, primer sequence, and a truncation length to every single sample. Binning allows us to have primer name/sequence/truncation length input completely separately from samples.