Closed ag1805x closed 3 years ago
Hi @ag1805x
Trim Galore is intended to identify and remove read-through adapter contamination which in your case is AGATCGGAAGAGC
. The presence of TruSeq adapters, or adapter dimers may occur as well, but this is a different issue than read-through contamination. As such, the behaviour to remove AGATCGGAAGAGC
contamination, but not GATCGGAAGAGC
, is both correct and expected.
In a bit more detail:
If you see TruSeq adapters in the sample that start with GATCGGAAGAGC...
this is really only the adapter, or a dimer of itself. This sequence will not align to any genome, so I simply not bother about it, it will effectively be removed in the alignment step.
From the trimming point of view the sequence AGATCGGAAGAGC
with the extra A
from A-tailing, cannot produce a good match the TruSeq adapter:
GATCGGAAGAGC... "reference"
| |
AGATCGGAAGAGC. adapter
The only option would be to allow so many mismatches that - as you said - it trims more or less random stuff as well.
I personally would only use Trim Galore in the default mode, and simply forget about some TruSeq dimer sequences (or maybe change the sample prep somewhat so you don't get these) as they drop out in the mapping step anyway.
As you have
I wouldn't have bothered much if it was RNA-Seq data. But this is WGBS data. I think in one of the tutorials it was mentioned: "adapter contamination may in a Bisulfite-Seq setting lead to mis-alignments and hence incorrect methylation calls". Can I afford to retain the adapters?
Using Trimmomatic does solve the issue though:
trimmomatic SE -threads 40 data.fastq.gz data_clean.fastq.gz ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 TRAILING:20 LEADING:20 MINLEN:20
The [read-through] adapter contamination, so reads that have a just a few bases of adapter on their 3' ends, may under certain circumstances be aligned to incorrect places (depending on mapping parameters).
Full length Illumina TruSeq adapters have no resemblance to the genome, and will thus not map (and also not result in incorrect methylation calls). I am pretty sure that removing or ignoring them will give the exact same results.
I am trying to trim WGBS data using TrimGalore but I observed that it is unable to trim all occurrences of the Illumina adapter (AGATCGGAAGAGC ) with default parameters. FastQC after trimming shows TruSeq adapter presence ( mostly starting with GATCGGAAGAGC...).
I tried changing the
-e
parameter and here are my observations:-e 0.1
: (default) not all adapters removed-e 0.05
: adapter retention high (lower performance than 0.1)-e 0.5
: adapter removed but total sequences halved and number of duplicated sequences increased.Should
--stringency
be low value (i.e. 1) and-e
be high value (i.e. 1)? Is there any other parameter that could be adjusted to solve this issue?Using
-a GATCGGAAGAGC
improved performance but in some cases adapters still remain.