benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
470 stars 142 forks source link

Big-data bind #1666

Closed Balakulandai closed 5 months ago

Balakulandai commented 1 year ago

Hi I am doing big-data analysis, were I'm using rbind to remove primers. When I use the maximum primer mismatch =0 the primerHit function is not able to predict the primers in the sequences Whereas, in max.mismatch=4 it showing high hits

My question is that good to use primer mismatch =4

Here are the details as follows

FWD <-"CCTACGGGNGGCWGCAG" #INSERT_FWD_PRIMER CCTACGGGNGGCWGCAG REV <-"GGACTACHVGGGTWTCTAAT" #INSERT_REV_PRIMER GGACTACHVGGGTWTCTAAT

allOrients <- function(primer) { # Create all orientations of the input sequence require(Biostrings) dna <- DNAString(primer) # The Biostrings works w/ DNAString objects rather than character vectors orients <- c(Forward = dna, Complement = complement(dna), Reverse = reverse(dna), RevComp = reverseComplement(dna)) return(sapply(orients, toString)) # Convert back to character vector }

FWD.orients <- allOrients(FWD) REV.orients <- allOrients(REV) FWD.orients REV.orients

check the primer length

primer_length_fwd <- str_length(FWD[1]) primer_length_rev <- str_length(REV[1])

primerHits <- function(primer, fn, maxi=0) {

Primers.SS<-rbind(FWD.ForwardReads = sapply(FWD.orients, primerHits, fn = fnFs+ FWD.ReverseReads = sapply(FWD.orients, primerHits, fn = fnRs+ REV.ForwardReads = sapply(REV.orients, primerHits, fn = fnFs+ REV.ReverseReads = sapply(REV.orients, primerHits, fn = fnRs

Primers.SS Forward Complement Reverse RevComp FWD.ForwardReads 0 0 0 0 FWD.ReverseReads 0 0 0 0 REV.ForwardReads 0 0 0 0 REV.ReverseReads 0 0 0 0

primerHits <- function(primer, fn, maxi=4) {

  • nhits <- vcountPattern(primer, max.mismatch = maxi, sread(readFastq(fn)))
  • return(sum(nhits > 0))
  • }

Primers.SS<-rbind(FWD.ForwardReads = sapply(FWD.orients, primerHits, fn = fnFs+ FWD.ReverseReads = sapply(FWD.orients, primerHits, fn = fnRs+ REV.ForwardReads = sapply(REV.orients, primerHits, fn = fnFs+ REV.ReverseReads = sapply(REV.orients, primerHits, fn = fnRs Primers.SS

             Forward Complement Reverse RevComp

FWD.ForwardReads 48016 0 0 0 FWD.ReverseReads 0 2 0 69 REV.ForwardReads 0 0 0 66 REV.ReverseReads 46901 0 0 0

Kindly give some suggestion Thank you for your time and suggestion.

benjjneb commented 1 year ago

To clarify, the primerHits command here is just detecting whether primers are on your reads, not removing them.

That said, it seems like your results suggest that the primers are not on your reads, if you are getting zero hits with max.mixmatch=0. If there really are those primers on the reads, they should be getting some hits with that setting.

Are you sure the primers are on the reads? A visual inspection of the starts of the first few reads in an example R1 fastq file would be a good place to start. Can you find the expected primers with your eyes? Then confirming with the sequencing provider that primers were (or were not) sequenced, and whether they did any pre-processing that could have removed them, would be in order.

Balakulandai commented 1 year ago

Thank you for the clarification

I have downloaded the sequence file from NCBI for my analysis, the sequence contains adaptor and primer, I used cutadapt to remove the adaptor and used primerHit for detecting the primer.

Screenshot 2023-01-03 at 3 07 07 PM

primerHits <- function(primer, fn, maxi=4) {

primerHits <- function(primer, fn) {

  • nhits <- vcountPattern(primer, sread(readFastq(fn)))
  • return(sum(nhits > 0))
  • }

Primers.SS<-rbind(FWD.ForwardReads = sapply(FWD.orients, primerHits, fn = fnFs[[1]]),

  • FWD.ReverseReads = sapply(FWD.orients, primerHits, fn = fnRs[[1]]),
  • REV.ForwardReads = sapply(REV.orients, primerHits, fn = fnFs[[1]]),
  • REV.ReverseReads = sapply(REV.orients, primerHits, fn = fnRs[[1]])) Primers.SS Forward Complement Reverse RevComp FWD.ForwardReads 66085 0 0 0 FWD.ReverseReads 0 0 0 0 REV.ForwardReads 0 0 0 0 REV.ReverseReads 0 0 0 0

Max= 4 and max =0 gave different hits Kindly classify which one is correct.

Thank you

benjjneb commented 1 year ago

It's hard to parse your code and results, the way they are formatted in your comment.

Usually, as long as the primer is correclty specified, only a small max.mixmatch is needed to detect the vast majority or primers. In general, I would not recommend max.mismatch=4 for a short primer sequence like you have here (would be about 20\% mismatches), as that invites spurious matching.