FelixKrueger / TrimGalore

A wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data
GNU General Public License v3.0
459 stars 149 forks source link

trimming multiple adapters from both reads #164

Closed mjbug closed 1 year ago

mjbug commented 1 year ago

Hello again,

I wanted to ask a clarifying question about trimming multiple adapters from paired end reads. I detected Nextera and small RNA 3' adapter. I then trimmed these by running trimgalore twice (first for nextera and second for small_rna). Is this acceptable and should I have changed any parameters in the second trimgalore run (e.g., length) or should I have used the -a and -a2 parameters?

When reading the below, I wasn't sure if "read 2" meant that it was the second time both paired end files would be read OR if it only meant for R2.fq.gz

-a2/--adapter2 Optional adapter sequence to be trimmed off read 2 of paired-end files.

If I were to run -a and -a2, would it look like the following trim_galore \ --phred33 \ --a nextera \ --a2 small_rna \ -e 0.1 \ --length 20 \ --output-dir [directory name] \ --paired [path/to/files]

Thank you!

FelixKrueger commented 1 year ago

Trim Galore only trims a single set of adapters, specified with -a for Read 1, and -a2 for Read 2 (in case it differs from the first adapter (Note: The Illumina or Nextera adapters can both be trimmed with the same adapter sequence).

So if you wanted to trim two different things you would indeed have to run 2 rounds of Trim Galore. However, for any given experiment you should only have used a single set of adapters, or are you combining a Nextera method with a small RNA sequencing?! If you run the auto-detection (the default), you should see how may hits were found for Illumina, Nextera or smallRNA adapters, and typically there is one clear winner (which will then subsequently get trimmed). I don't think I can recall a situation where trimming two different types of adapter was ever the right thing to do....

mjbug commented 1 year ago

Thanks for the clarification about -a and -a2!

I am not combining a Nextera method with small RNA sequencing. I didn't run autodetection, but looked at my fastqc reports (see attached) and noticed some had Illumina small RNA 3' adapter content.

S30_R1_fastqc S45_R1_fastqc

I have the separate trimmed files from when I ran trimgalore for Nextera. However, for 16S amplicon analysis, I have been using the trimmed files for both Nextera and small_rna. Do you think this would make a big difference in the analysis? I'm thinking that since I use qiime cutadapt trim-paired to pull out the 16S primers then use qiime dada2 denoise-paired, I'm still extracting out my region of my interest? I'm no expert, so would appreciate any thoughts on this matter!

Thank you for such a quick response, especially on the weekend!

FelixKrueger commented 1 year ago

Hmm, the small RNA 'contamination' seems to come in at a very specific sequence range (~160bp), and then does not change any further. My guess would be that this is either single contaminant, or potentially a sequence abundant in your data that accidentally happens to contain the small RNA sequence at that position. I suppose you could try to get to the bottom of this by finding sequences containing the smallRNA adapter, and seeing what it is (something like:

gunzip -c file_R1.fastq.gz | grep TGGAATTCTCGG 

Maybe in combination with grep -B 1 -A 2 > affected_sequences.fastq

As this only affects ~2% of sequences, I doubt that the overall results change dramatically. Similarly, since you are not seeing Nextera read-through contamination until very late in the reads, I doubt the results would be dramatically different. If I were in your position, and this isn't smallRNA-seq, I would probably just go with the auto-detection (probably Nextera here) and move on.

mjbug commented 1 year ago

Gotcha, thank you for your input. I'll try gunzip and grep later.

Would you recommend not using the twice trimmed sequences (Nextera and small_rna) at all for later analysis? And if not, why?

FelixKrueger commented 1 year ago

I suppose you could use the dual trimmed files, as the effect will either be minimal or even unnoticeable.

When it comes down to writing up methods for a paper however, you will have to admit that you trimmed the files for small RNA adapters in addition to the correct adapters because you didn't quite understand what you were supposed to be doing :) . So if this going to end up in a publication I would probably just do the auto-detection - for anything else it should be fine to use it single or dual trimmed.

pcantalupo commented 2 months ago

So if you wanted to trim two different things you would indeed have to run 2 rounds of Trim Galore. However, for any given experiment you should only have used a single set of adapters, or are you combining a Nextera method with a small RNA sequencing?! If you run the auto-detection (the default), you should see how may hits were found for Illumina, Nextera or smallRNA adapters, and typically there is one clear winner (which will then subsequently get trimmed). I don't think I can recall a situation where trimming two different types of adapter was ever the right thing to do....

What about this SARS2 Ribo-seq paper where they say "linker (CTGTAGGCACCATCAAT) and poly-A sequences were removed"? I'd like to use TrimGalore but there is no option to supply multiple adapters/sequences. fastp allows this with their adapter file https://github.com/OpenGene/fastp?tab=readme-ov-file#adapters (they say: --adapter_fasta to give a FASTA file to tell fastp to trim multiple adapters in this FASTA file). So, looks like there are use cases for multiple adapters. Or am I thinking about this incorrectly?

FelixKrueger commented 2 months ago

Hi Paul,

you can pass in a .fa file to allow trimming of multiple adapters. This is from the --help text:

...
At a special request, multiple adapters can also be specified like so:
-a  " AGCTCCCG -a TTTCATTATAT -a TTTATTCGGATTTAT"
-a2 " AGCTAGCG -a TCTCTTATAT -a TTTCGGATTTAT", or so:
-a "file:../multiple_adapters.fa"
-a2 "file:../different_adapters.fa"
Potentially in conjunction with the parameter "-n 3" to trim all adapters.

Does this answer your question?

pcantalupo commented 2 months ago

Yes, it does...I will try this. Thank you

Please update the documentation https://github.com/FelixKrueger/TrimGalore/blob/master/Docs/Trim_Galore_User_Guide.md#step-2-adapter-trimming. That is why I commented on this issue. I didn't think this functionality was implemented.

FelixKrueger commented 2 months ago

This has now been added to the documentation, and will find its way into the main branch with the next update.