Closed btupper closed 2 years ago
I like it! So would filter and trim run in preprocess and then we point to outputs from that step in process? Assuming all looks good in user supervision?
Cutadapt should be the first step as all downstream steps assume primers have been trimmed off.
Preprocess through learn_errors(). Then process starting with filter_and_trim() but with option to skip over that part and go straight to run_dada().
multistep ASV workflow
It seems that eDNA datasets are, at least for now, mostly edge cases - that is each new sample submitted to the workflow brings unlooked-for qualities. The pipeline, in its original conception, was designed to be a simple drop-and-run process. That design makes it difficult to ascertain the needs of a particular dataset analysis before running the costly dada and and taxonomy matching steps.
To accomodate the fluidity of the eDNA datasets, we proposed to split the ASV workflow into at least 3-steps: preprocessing, user supervision, and processing.
1 Preprocess
User generates config
input.yaml
Preprocess
outdir/input.yaml
as archiveoutdir/preprocess/input-preprocessed.yaml
as archiveoutdir/input-supervised.yaml
2 User supervision
User review preprocess outputs
"Should I stay or should I go?"
If a go then user adjust config
outdir/input-supervised.yaml
3 Process
outdir/input-supervised.yaml