Closed hkaspersen closed 2 years ago
Hi @hkaspersen
This behaviour is expected, and is a result of using a very strict stringency by default:
--stringency <INT> Overlap with adapter sequence required to trim a sequence. Defaults to a
very stringent setting of 1, i.e. even a single bp of overlapping sequence
will be trimmed off from the 3' end of any read.
As your adapter sequences both start with an A
(correctly auto-detected from AGATCGGAAGAGC
), none of the trimmed sequences will end in an A
. While this looks quite drastic in the FastQC report, it will certainly not have a noticable impact on mapping of 150bp long sequences.
Just for the background, Trim Galore was originally meant to assist with bisulfite sequencing, where sequences are not only aligned to a certain locus in the genome, but individual bases also take part in calling the methylation state of cytosines. For this purpose we set the --stringency
to a very strict setting of 1, I believe the default for Cutadapt on its own for genomic sequences is 3. While this did matter back somewhat in 2010 when sequences were only 28 or 36bp long, these days I think one can comfortably deal with generously removing suspected adapter sequences. Does this answer your question?
Thank you for a swift reply! Yes, this is very useful. I think we can safely ignore the issue then. Thanks!
Hello, and thanks for creating a great tool!
We used TrimGalore to trim shotgun metagenomic datasets sequenced on NovaSeq with the ThruPLEX library prep. We noticed that after trimming with the auto-detect setting, none of the reads in all our samples ended with an "A". We were wondering if this was something to do with the index sequence not being removed, or if something happened in the trimming? I'm not sure if this is something to worry about, just thought it was peculiar and would like to know why!
This is the command we used:
These are the adapter sequences that the sequencing provider used (not used in the trimming command): --adapter1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC --adapter2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA
This is the fastqc report for per base sequence content before trimming:
And after: