Closed XiaoTW123 closed 2 years ago
The reason for the funky looking 3' ends is simply a consequence of strict adapter trimming. As the standard Illumina adapter starts with AGATC...
, it simply means that reads may never end in A
(as this could be the first base of adapter, and is therefore trimmed. Equally, ends AG
, AGA
, AGAT
, AGATC
.... are all trimmed off the ends of reads; as a result you get this drop of A for the very last base, with the other 3 bases taking over.
Is your data RNA-seq by any chance? In those cases one typically finds biased positions at the start, but they don't tend to interfere with anything dramatically (and are typically not removed). Removing them will probably only shift the start/end coordinates by 10bp.
The reason for the funky looking 3' ends is simply a consequence of strict adapter trimming. As the standard Illumina adapter starts with
AGATC...
, it simply means that reads may never end inA
(as this could be the first base of adapter, and is therefore trimmed. Equally, endsAG
,AGA
,AGAT
,AGATC
.... are all trimmed off the ends of reads; as a result you get this drop of A for the very last base, with the other 3 bases taking over.Is your data RNA-seq by any chance? In those cases one typically finds biased positions at the start, but they don't tend to interfere with anything dramatically (and are typically not removed). Removing them will probably only shift the start/end coordinates by 10bp.
Thanks for your reply. Things i described above happened both in my illumina data and HiC reads.
If by Illumina data you mean you mean Illumina RNA-seq data, then that's to be expected (and I would probably not trim off the 5' ends). For the Hi-C data you might want to check with the protocol whether some enzymatic restriction sites are expected on the 5'-end that need to be kept in place.
In any case, I hope it became clear that the phenomenon on the 3' is expected and nothing to worry about.
Hi, Felix Krueger I used fastqc before trimming, and found that the 'per base sequence content' of 5' was not good, so i used the following command for quality control (removing adapters, low quality reads, and 10 bp at the 5'): trim_galore --paired --quality 20 --fastqc --cores 8 --clip_R1 10 --clip_R2 10 --basename ngs_trimmed xx.R1.fastq.gz xx.R2.fastq.gz After trimming, the 'per base sequence content' of 5' looks good, but 3' changed oddly. Is there anything wrong with my command ? Wish to here from you soon. Thanks.