benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
459 stars 142 forks source link

DNAext -> sequencing -> (+mock fastq) ASV_determination #1334

Closed csmiguel closed 3 years ago

csmiguel commented 3 years ago

problem: I have MiSeq PE metabarcoding fastq files from bacteria, but no mock community sequenced. I want to validate my bioinformatics workflow. question: as a partial solution, could I add DNA sequences from someone else data (mock.1.fastq/ mock.2.fastq), to the start of my pipeline? any links to mock.fastq files recommended? (I guess Mock_S280_L001_R1/2_001.fastq could work) for validation, should I just focus on the mock taxa identified or also evaluate abundances? conditions: 1. sequence lengths have to be the same (300bp) (or I can truncate them to equal the length of the shortest); 2. primers need to be the same (or I can add mock reads after primer trimming).

Thanks

benjjneb commented 3 years ago

If your goal hear is just to sanity check your workflow code, downloading a mock community from somewhere else and analyzing it in this way can be a good solution.

However, since error profiles of different experiments are different, this approach won't be as effective as a fine-grained evaluation of your bioinformatics workflow on your data, e.g. for fine-tuning parameters or the like.

In either case, I would probably run the mock community separately from the rest of your data, since it will be coming from a separate experiment.

csmiguel commented 3 years ago

Thanks, I found these two great mock resources: https://github.com/caporaso-lab/mockrobiota https://resendislab.github.io/microbiome/articles/mock_example.html