Closed mo716 closed 3 years ago
Hi @mo716,
From what you are describing, the most likely explanation would be: your libraries do not appear to look like PBAT libraries! The easiest way to check this would be to look at the base composition plot of Read 1. If it is low in C
, and high in G
, the library is directional. Read 2 will be its reverse complement, so low in G
and hight in C
.
The simplest solution would then be to simply run the samples in default (directional) mode (even though I think reversing R1 and R2 and using PBAT mode will do the same thing, only the strand assignments will be to the CTOT and CTOB strands instead of the intended OT and OB strands).
If the libraries aren't PBAT, I wonder if the trimming wasn't a bit excessive (you are losing 36 bp for each read pair outright)? I'd be happy to look at some QC plots, or if a send a few reads over the (200K should be enough, and fit into an email) I can run a quick test over here.
Finally, I don't think -p 20
won't help you a great deal, I think there is a diminishing return after anything more than -p 4
. So maybe you can save a few resources.
Hi @FelixKrueger ! Thank you very much for your fast reply and help! I just sent you by e-mail the fastqc files and some small subsets of the raw and trimmed reads for a given sample.
Thanks for that, I'm on it (I already think I know what it is, will report back within the hour).
Hi again,
I am pretty certain your libraries were constructed with the Accel Swift kit. I believe this kit uses a post-bisulfite method, but it does not in fact give rise to PBAT style libraries. Instead, it creates directional libraries, with a very pronounced bias of Gs at the start of Read 2.
Therefore, the right way to go about this is to trim the start of both R1 and R2 (which you have accidentally already done, even though not completely), and potentially also the last bases on the 3' end to avoid reading into the biases from the other read in case of read-through, but I am not sure if that is so relevant. We have written this up a little more here: https://github.com/FelixKrueger/Bismark/blob/master/Docs/README.md#ix-notes-about-different-library-types-and-commercial-kits
So I have just used a very quick trimming run like this:
trim_galore --clip_r1 10 --clip_r2 15 --three_prime_clip_r1 10 --three_prime_clip_r2 10 --paired raw_E11_1W1_L1_R1.fastq.gz raw_E11_1W1_L1_R2.fastq.gz
and then ran a standard Bismark command on the rapeseed genome:
bismark --genome genome/ -1 raw_E11_1W1_L1_R1_val_1.fq.gz -2 raw_E11_1W1_L1_R2_val_2.fq.gz
This results in a ~62% mapping efficiency (and ~23% ambiguous) alignments. So I would say: all good, swap the command line options, and all will be fine!
Cheers, Felix
Hi @FelixKrueger ! Thank you very much, that was indeed helpful. I will trim the reads again based on your recommendations for the Accel Swift kit and re-do the alignments as you suggested Have a great day! :)
Hi!! I am struggling a bit with the alignment of my 150 paired-end read PBAT libraries. I removed any adapters and trimmed 9 basepairs from the 5' and 3' end of both reads using TrimGalore v.0.6.6 as follows:
trim_galore -j 4 --paired --basename ${OUTPUT} --clip_r1 9 --clip_r2 9 --three_prime_clip_r1 9 --three_prime_clip_r2 9 --path_to_cutadapt /PATH/TO/CUTADAPT ${LIB1} ${LIB2}
Then I aligned the reads with Bismark v0.23.0 using the --pbat and --unmapped options as shown below:
bismark --parallel 4 --pbat --unmapped -p 20 --genome /PATH/TO/GENOME/ -1 ${READ1} -2 ${READ2}
Nevertheless, the average mapping efficiency for most libraries with this method is around 0.2%. Here is one example of a PE alignment report for sample E11:
I proceeded with aligning the unmapped reads separately as single end reads for the same sample (E11) as the one above based on the "Dirty Harry" method:
bismark --parallel 4 --pbat -p 20 --genome /PATH/TO/GENOME/ --se E11_unmapped_reads_1.fq.gz
bismark --parallel 4 -p 20 --genome /PATH/TO/GENOME/ --se E11_unmapped_reads_2.fq.gz
The mapping efficiency for the SE alignments were still very low:
Unmapped READ1 SE output (--pbat)
Unmapped READ2 SE output (--directional)
Since this looked a bit strange, I then repeated the SE alignment with the --pbat option for READ2 instead (and directional or default for READ1). Here are the results:
Unmapped READ1 SE OUPUT (--directional)
Unmapped READ2 SE output (--pbat)
The mapping efficiency improved and raised to 60%. So I thought that maybe there was a problem with READ1 actually being READ2 and vice-verse, but that is not the case. I checked the fastq file info for READ1 and they all have a "1" in the 8th column info.; whereas READ2 has a "2".
On a final attempt to understand this better, I ran the PE alignment again for sample E11 but with READ1 as READ2 and vice-verse:
bismark --parallel 4 --pbat --unmapped -p 20 --genome /PATH/TO/GENOME/ -1 ${READ2} -2 ${READ1}
The output from this is:
This looks quite decent, without having to go trough the "Dirty Harry" method. I am working with an allopolyploid, highly repetitive plant genome, so mapping efficiencies above 60% for a methylation based alignment are quite good! Nevertheless, I do not understand why READ1 seems to function as READ2 and the other way around. The kit used for the bisulfite convertion is EZ DNA Methylation Gold Kit (Cat No. D5005) from Zymo Research.
I would really appreciate your feedback on this. I did a fastqc for all libraries and merged them with multiqc, so I can send it to you if it helps. Thank you very much for your patience and any help! Cheers,