FelixKrueger / TrimGalore

A wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data
GNU General Public License v3.0
459 stars 149 forks source link

trimgalore and cutadapt in two steps? #147

Open marwa38 opened 1 year ago

marwa38 commented 1 year ago

Hi team I know that trimgalore is a wrapper around cutadapt but I want to check what I ran is considered fine I ran trimgalore to remove adaptors and primers at 5' and 3' ends but then

trim_galore -q 30 --phred33 --paired --gzip --fastqc --fastqc_args "--nogroup" -o ./trimGalored --length 20 --stringency 1 -e 0 --clip_R1 17 --clip_R2 21 *_S"$i"_L001_R1_001.fastq.gz *_S"$i"_L001_R2_001.fastq.gz

I found that I need to check for primers within the sequences themselves and I found and I removed them Things are ok that I did my analysis in that way? I don't want to reran the analysis again but I want to make sure that this is fine.

# Place filtered files in filtN/ subdirectory
fnFs.filtN <- file.path(path, "filtN", basename(fnFs)) # Put N-filterd files in filtN/ subdirectory
fnRs.filtN <- file.path(path, "filtN", basename(fnRs))
filterAndTrim(fnFs, fnFs.filtN, fnRs, fnRs.filtN, maxN = 0, multithread = TRUE)

#Check if primers are still inside (N, W and H nucleotides are identified by cutadapt)
FWD <- "CCTACGGGNGGCWGCAG"  
REV <- "GACTACHVGGGTATCTAATCC"

allOrients <- function(primer) {
  # Create all orientations of the input sequence
  require(Biostrings)
  dna <- DNAString(primer)  # The Biostrings works w/ DNAString objects rather than character vectors
  orients <- c(Forward = dna, Complement = complement(dna), Reverse = reverse(dna), 
               RevComp = reverseComplement(dna))
  return(sapply(orients, toString))  # Convert back to character vector
}
FWD.orients <- allOrients(FWD)
REV.orients <- allOrients(REV)
FWD.orients

# now ready to count the number of times the primers appear in the forward and reverse read, while considering all possible primer orientations. 
#Identifying and counting the primers on one set of paired end FASTQ files is sufficient, assuming all the files were created using the same library preparation, so we’ll just process the first sample.

primerHits <- function(primer, fn) {
  # Counts number of reads in which the primer is found
  nhits <- vcountPattern(primer, sread(readFastq(fn)), fixed = FALSE)
  return(sum(nhits > 0))
}
library (ShortRead)
rbind(FWD.ForwardReads = sapply(FWD.orients, primerHits, fn = fnFs.filtN[[1]]), 
      FWD.ReverseReads = sapply(FWD.orients, primerHits, fn = fnRs.filtN[[1]]), 
      REV.ForwardReads = sapply(REV.orients, primerHits, fn = fnFs.filtN[[1]]), 
      REV.ReverseReads = sapply(REV.orients, primerHits, fn = fnRs.filtN[[1]]))
#                   Forward Complement Reverse RevComp
# FWD.ForwardReads       0          0       0       0
# FWD.ReverseReads       0          0       0     112
# REV.ForwardReads       0          0       0       8
# REV.ReverseReads      99          0       0       0
FelixKrueger commented 1 year ago

Hi @marwa38

I am just trying to understand exactly what you were trying to achieve here. Just to say, Trim Galore attempts to identify the adapter used in your system, by looking for exact matches either the Illumina standard, Nextera or Illumina small RNA adapters. If there is a clear 'winner', that adapter sequence is selected for removal (without any IUPAC ambiguity codes).

Also, the sequences you specified are lacking the A from the A-tailing procedure. In other words, Trim Galore is not looking for these sequences, as they tend not to be read-through contaminations that inhibit read mapping, but often result from primer-dimers and the like. Such sequences do not align to any genome, so they get effectively pruned by the alignment process itself.

Is this the answer you were looking for?

marwa38 commented 1 year ago

Thank you for the discussion @FelixKrueger

Trim Galore attempts to identify the adapter used in your system,

Yeah I was having Nextera adaptors which is good trimGalore have removed.

Also, the sequences you specified are lacking the A from the A-tailing procedure.

I didn't mention any sequences in trimGalore code trim_galore -q 30 --phred33 --paired --gzip --fastqc --fastqc_args "--nogroup" -o ./trimGalored --length 20 --stringency 1 -e 0 --clip_R1 17 --clip_R2 21 *_S"$i"_L001_R1_001.fastq.gz *_S"$i"_L001_R2_001.fastq.gz

What do you mean regarding "A"? you mean that I can't use V3-V4 primers that I above mentioned in my cutadapt codes to remove primers? you mean that TrimGalore --clip_R1 17 --clip_R2 21 have already removed all primers?

FelixKrueger commented 1 year ago

I was referring the code snippets in the original post:

#Check if primers are still inside (N, W and H nucleotides are identified by cutadapt)
FWD <- "CCTACGGGNGGCWGCAG"  
REV <- "GACTACHVGGGTATCTAATCC"

but at second glance these sequences to not appear to be related to adapters after all - so please just ignore the comment about A-tailing.

I suppose my question is: why do you want to remove primer sequences at all? Trim Galore is really only meant to remove read-through adapter contamination and poor sequences from the 3'-end. Do you have a setup that requires you to remove and filter a lot more, including biases at the starts as well as ends of sequences, adapter contamination plus primer sequences?

marwa38 commented 1 year ago

Thanks for your reply and sorry for responding after a week @FelixKrueger . I think that removing primers along with the adaptors will make my dataset cleaner as part of the quality control. Is it a problem to remove primers? I am following the primer removal pipeline from here https://benjjneb.github.io/dada2/ITS_workflow.html

Thank you M

FelixKrueger commented 1 year ago

I just took a look at the dada2 workflow for ITS amplicon sequencing. I have to admit that I really don't feel that I know enough about 16S sequencing to give much advice on what to do, but it would appear that trimming of the amplicon primer sequences seems necessary. This doesn't necessarily mean that you couldn't accomplish your goals with Trim Galore, but you would have to know exactly what you want to be doing. Maybe it would just be easier to follow their workflow instead?

marwa38 commented 1 year ago

Thanks for your comments Just for more clarification; I removed the Nexetra adaptors using trimGalore as I don't know their sequences but I removed the primers using cutadapt as I know their sequences