FelixKrueger / TrimGalore

A wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data
GNU General Public License v3.0
472 stars 151 forks source link

question about trimming #192

Closed hidvegin closed 5 months ago

hidvegin commented 5 months ago

Dear @FelixKrueger,

I have got paired-end reads from Illumina sequencer. I analyzed the reads with fastqc, but I can not decide that I should trim 5' or 3' end of the R1 or R2 reads. I would like to use this reads for de novo assembly because we do not have any reference genome. I add here the results. Could you help me with this issue?

1RS-1BL_Mv179_S1_R1_001_fastqc.zip 1RS-1BL_Mv179_S1_R2_001_fastqc.zip

FelixKrueger commented 5 months ago

These files appear to have undergone some sort of trimming already, but presumably only for adapter but not quality? There are three things I noticed, but I am not sure if any of them are worth looking at specifically, generally the quality is very good.

  1. There is as tiny amount of poly-G (coming through, this is probably when the sequencer ran out of signal). Trimming e.g. -a GGGGGGGGGG should get rid of this.
  2. Trim Galore applies a default quality cutoff of Phred 20 - this might help a little.
  3. There appear to be some biases at the 5' end of your sequences, not sure if this is expected or not. If this is part of the the method and the sequences are of genomic origin you'll be fine. If you added some kind of artificial sequence it might be worth trimming it off.

Also you could of course run 2 assemblies, one trimmed, and one 'as-is', and compare :)

hidvegin commented 5 months ago

Thank @FelixKrueger for your answer and help.

I tried two different parameters for TrimGalore based on your suggestions.

  1. I used the default parameters for TrimGalore:

1RS-1BL_Mv179_S1_R1_001_val_1_fastqc.zip 1RS-1BL_Mv179_S1_R2_001_val_2_fastqc.zip 1RS-1BL_Mv179_S1_R1_001.fastq.gz_trimming_report.txt 1RS-1BL_Mv179_S1_R2_001.fastq.gz_trimming_report.txt

  1. I used this parameters: -a GGGGGGGGGG --clip_R1 13 --clip_R2 13 These are the results:

1RS-1BL_Mv179_S1_R.zip 1RS-1BL_Mv179_S1_R1_001.fastq.gz_trimming_report.txt 1RS-1BL_Mv179_S1_R2_001.fastq.gz_trimming_report.txt

Based on the results, I think I should do two trimming process. First, I should use the default parameters and TrimGalore will cut the Illumina universal adapter. After, I should use the -a GGGGGGGGGG --clip_R1 13 --clip_R2 13 parameters.

What do you think about it?

FelixKrueger commented 5 months ago

That looks really quite clean... Just as a comment, you should also be able to do both of these actions in one go, by specifying


trim_galore --paired --clip_R1 13 --clip_R2 13 -a " AGATCGGAAGAGC -a GGGGGGGGGGGGGGG -n 2" -a2 " AGATCGGAAGAGC -a GGGGGGGGGGGGGGG -n 2" file 1 file2
hidvegin commented 5 months ago

Thank @FelixKrueger for your answer.