Estimate rate of index hopping with synthetic mocks

benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution

http://benjjneb.github.io/dada2/

GNU Lesser General Public License v3.0

469 stars 142 forks source link

Estimate rate of index hopping with synthetic mocks #1101

Closed marissalee closed 3 years ago

marissalee commented 4 years ago

Hi Ben and others, I am interested in estimating the rate of index hopping (aka tag switching or index bleed) in my Illumina MiSeq dataset.

I included synthetic "ITS" sequences (sensu Palmer et al 2018) in my last Illumina MiSeq run and would like to calculate index hopping based on the rate that these synthetic sequences show up in the biological samples. Ideally, I'd like to do this without having to use and download the software wrapper the authors developed (AMPtk), which does a bunch of other stuff that I don't need.

Do you have any recommendations about where (and how) in the dada2 pipeline to do this?

I was thinking the best way to get at this would be use to cutAdapt where instead of searching for the primers (as is done in the ITS dada2 tutorial) I’d search for the synthetic sequences.

I'd appreciate any and all suggestions! Thanks, Marissa

benjjneb commented 4 years ago

I think you probably want to do this after running through the DADA2 workflow to the sequence table.

@mikemc @adw96 (or David, but I don't know his GH handle): Any more concrete thoughts on this?

marissalee commented 4 years ago

Thanks for the speedy response! That seems reasonable to just work from the ASV table, allowing dada2 to account for error in the sequencing first.

mikemc commented 4 years ago

I have only done this starting from reads that have already been demultiplexed in BaseSpace / by the sequencing center. In this case, the index sequences should already be trimmed from the reads, and the reads are split into separate files. However, you know what the indexes should have been (based on the plate design / index assignment), and what they ended up being (based on how the reads were demultiplexed). So after running DADA2 as normal to get ASVs, and figuring out which ASV(s) correspond to the positive controls, you can use this info to estimate rates of index hopping in Index 1 and Index 2.

It sounds like you are instead thinking of starting with reads where the index sequences are still present, and perhaps doing the demultiplexing yourself? That would also be fine but I haven't attempted or thought through doing things this way.

marissalee commented 4 years ago

Hi Mike -- Would you be up for chat about mocks via Zoom? I've got a number of questions that I'd like to ask that are pretty outside the scope of just this issue. I'll send you an email