benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
468 stars 142 forks source link

Unusually high number of Reverse complement primer hits in 16S sequencing files #1562

Closed arijitnus closed 4 months ago

arijitnus commented 2 years ago

Hi, I am running a set of samples through DADA2 pipeline and I found an unusually high number of Rev comp primer hits. In my previous experience, the Rev Comp hits are generally low (~1k-2k). May you please let me know the possible reasons?

image

Refer row 3 and column 4 in this picture which is around 65k in contrast to row 2 and column 4

benjjneb commented 2 years ago

Seeing the Reverse-Complement of the REV primer on the forward reads is expected if the forward read is long enough to read through into the reverse primer. Is the amplicon you are sequencing shorter than the reverse reads?

It is worth noting that you aren't seeing the same in most of the reverse reads. Are the reverse reads shorter than the forward reads at this point?

arijitnus commented 2 years ago

@benjjneb thanks for the note. Let me explain the primers and regions I'm using for amplification for you to better understand.

I'm using 515F-806r primer pair as suggested in Earth microbiome project and we obtained data for 2x300 bp 16S amplicon. targeting V4 region (~254 bp). So the amplicon is shorter than reverse reads.

My primers also have adapters added for Illumina and length of FW and REV primer is 52 and 54 respectively. Since our data is 2x300 bp Paired end, I am unsure of your question, whether reverse reads can be shorter than FW reads. Thanks again!

benjjneb commented 2 years ago

The Illumina adapters are not typically sequenced, just the primers. So your amplicon is shorter than y our read length, and the read-through into the opposite primer is expected. Easy solution though, just make sure to truncate your reads in filterAndTrim to be shorter than the sequenced amplicon (e.g. 250nts or less, but long enough to overlap), and the primer read-through will be gone.

My guess is that you are seeing less read-through primer matches on the reverse reads just because the sequence quality drops off a lot at the end of those reads, which is typical with Illumina 2x300 sequencing.