Closed abearab closed 1 year ago
This sounds a little odd. Would you be able to provide some test sequences for me to take a look at? (some 200K reads untrimmed, gzipped) should fit in an email. Cheers
Yeah, it is. I'll send the test fastq files shortly.
Hi Abe,
I really don't know what went wrong, but your data looks absolutely lovely!
The data appears to be Accel Swift data, with a hefty bias of Gs at the start of Read 2:
According to our trimming recommendations for this type of library (see here) I went ahead and trimmed the data like so:
trim_galore --three_prime_clip_R1 10 --clip_R2 20 --clip_r1 10 --three_prime_clip_r2 10 --paired 5b_R1.fastq.gz 5b_R2.fastq.gz
Then, using default Bismark alignments to the human genome I achieved the following stats:
Sequence pairs analysed in total: 192138
Number of paired-end alignments with a unique best hit: 153633
Mapping efficiency: 80.0%
...
C methylated in CpG context: 80.2%
C methylated in CHG context: 0.7%
C methylated in CHH context: 0.7%
Which to looks very good indeed. You might get away with trimming only 15bp from R2, but this is really personal preference. I hope this helps?
Cool, thanks for doing this. I can also get the same Mapping efficiency!! I'm closing this issue for now, I'll stay in touch if I have more questions :)
Hi There,
I need some technical assistance. I'm trying to setup a pipeline for WGBS data prepped using xGen™ Methyl-Seq DNA Library Prep Kit. I could get ~60% mapping efficiency using TrimGalore + Bismark pipeline with
TrimGalore
's auto detection of adaptor sequences.Now I am aiming to go for a higher mapping efficiency as promised here – i.e. 70%. To my understanding this is their suggestion:
Here is my script which made the mapping efficacy even worse, ~40%:
@FelixKrueger, do you have any idea how this can be resolved? I had great experience asking technical questions here and your code maintenance and responsiveness is appreciated!