FelixKrueger / TrimGalore

A wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data
GNU General Public License v3.0
472 stars 151 forks source link

Truseq Methyl capture epic library -specific options for trimming #103

Closed freedomq8 closed 4 years ago

freedomq8 commented 4 years ago

Hi there,

I saw that there are RRBS-specific options for trimming just wanted to know what should I include for WGBS data with truseq-methyl capture-epic library. is there a recommendation to what to include for this kit.

Thanks

FelixKrueger commented 4 years ago

I am afraid I don't have any specific experience with this kit, so I would recommend you stick to the instructions given by the kit protocol. Just generally, WGBS data typically doesn't require anything special but simply uses the most basic command:

trim_galore file.fastq.gz

However, certain pull-down methods often require hard-clipping of a few residues at the 5' or 3' ends, but I would need to see the FastQC html report (especially the base composition plot) to judge this better. Hope this helps, Cheers, Felix

freedomq8 commented 4 years ago

Hi there, sorry for my lateness in replying to your post. Just to give an update and correction for my issue. The kit that were used in our methylation project is not WGBS but a targeted methylation. TruSeq Methyl Capture EPIC Library Prep Kit is the recommended replacement product.

I used RBBS parameters s recommended by a colleague but I believe these settings is harsh. trim_galore --quality 30 --phred33 --stringency 3 --length 50 --rrbs --trim1 --paired I would be grateful if you can share your recommend filtering parameters for epigenome/Methylseq-EPIC kit ?

Thanks

FelixKrueger commented 4 years ago

I am afraid I have limited experience with this library kit (read: I have never used it before), so it is tricky to give accurate recommendations. I suppose the most relevant would be whether or not there are some biased positions at the start that would need to be trimmed off before the alignment step. Examples for this can be seen here: https://sequencing.qcfail.com/articles/mispriming-in-pbat-libraries-causes-methylation-bias-and-poor-mapping-efficiencies/. In short: if you see a biased composition of bases at the start, low mapping efficiencies, and/or biases in the M-bias plot you probably should add trimming on the 5' end. If you could send a FastQC html report for me to take a look at the base composition, and the M-bias report I could probably comment on this further.

Finally, regarding the command above:

I hope this helps?

freedomq8 commented 4 years ago

Thanks Felix for the tips. I have uploaded the fastqc results of of the raw fastq and mapped sam file using the parameters below. Also I included the M.bias report (see dropbox link below). bismark -n 1 --bowtie2 --path_to_bowtie bowtie2-2.4.1-linux-x86_64 hg19 1.fastq 2.fastq --sam -p 4 https://www.dropbox.com/s/mpz2zq72a7mjtee/QC.zip?dl=0

I do have another query. Do I need to do a deduplication step for this kit ?

Thanks

FelixKrueger commented 4 years ago

Thanks for the reports. The FastQC reports don't appear to come from raw raw data, as the sequences have at least been length (35-77bp) and probably also adapter and quality-trimmed. The base composition plot shows a few blips in the curve for the first few positions but it is difficult to tell from the FastQC reports.

The M-bias reports also show some irregularities at the first ~6bp of both R1 and R2 (in addition you see a typical drop in the methylation levels at the start of R2 (see here: https://sequencing.qcfail.com/articles/library-end-repair-reaction-introduces-methylation-biases-in-paired-end-pe-bisulfite-seq-applications/). So if it was my library I would probably re-run the trimming with a command like this:

trim_galore --paired --clip_r1 6 --clip_r2 6 file_R1.fastq.gz file_R2.fastq.gz

Regarding de-duplication, this has pretty much been discussed in detail here already: https://github.com/FelixKrueger/Bismark/issues/356#issuecomment-651309287 In your specific case, FastQC estimates a duplication rate of ~30%. For capture data it is often a good idea not to deduplicate, but you could of course experiment with this to see whether it would make any difference at all.

freedomq8 commented 4 years ago

This is really helpful, thanks for your efforts and quick reply.