FelixKrueger / Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states
http://felixkrueger.github.io/Bismark/
GNU General Public License v3.0
386 stars 101 forks source link

about Bismark default run #409

Closed Yuxinagli closed 3 years ago

Yuxinagli commented 3 years ago

Hi Felix,

Thank you very mcuh for providing Bismark and help. My question is why I got different mapping ratio results with similar parameters. I ran two Bismark with same sample as follow: command 1: bismark --bowtie2 --score_min L,0,−0.2 --gzip -X 600 -un --parallel 12 --bam /home/yli/ref/genome/female.mm10 -1 $sample_R1_val_1_val_1.fq.gz -2 $sample_R2_val_2_val_2.fq.gz -o ./${sample}_bismark_bowtie_mm10_score_min command 2: bismark --bowtie2 --gzip -X 600 -un --parallel 12 --bam /home/yli/ref/genome/female.mm10 -1 $sample_R1_val_1_val_1.fq.gz -2 $sample_R2_val_2_val_2.fq.gz -o ./${sample}_bismark_bowtie_mm10_score_min

The only difference is I specified the " --score_min L,0,-0.2" in command1. Especially, from the *PE.report.txt files, I can see the bismark run under same parameters, but I am confused by the ~10% gap of mapping ratio difference between these two runs. Could you please help me with this?

Thanks,

Yuxiang Li

*PE.reprot.txt for command 1:

Bismark report for: wgbs_2_R1_val_1_val_1.fq.gz and wgbs_2_R2_val_2_val_2.fq.gz (version: v0.22.3) Bismark was run with Bowtie 2 against the bisulfite genome of /home/yli/ref/genome/female.mm10/ with the specified options: -q --score-min L,0,−0.2 --ignore-quals --no-mixed --no-discordant --dovetail --maxins 600 Option '--directional' specified (default mode): alignments to complementary strands (CTOT, CTOB) were ignored (i.e. not performed)

Final Alignment report

Sequence pairs analysed in total: 11968824 Number of paired-end alignments with a unique best hit: 7811622 Mapping efficiency: 65.3% Sequence pairs with no alignments under any condition: 3084750 Sequence pairs did not map uniquely: 1072452 Sequence pairs which were discarded because genomic sequence could not be extracted: 2

Number of sequence pairs with unique best (first) alignment came from the bowtie output: CT/GA/CT: 3907390 ((converted) top strand) GA/CT/CT: 0 (complementary to (converted) top strand) GA/CT/GA: 0 (complementary to (converted) bottom strand) CT/GA/GA: 3904230 ((converted) bottom strand) ... ...


*PE.report.txt for command 2:


Bismark report for: wgbs_2_R1_val_1_val_1.fq.gz and wgbs_2_R2_val_2_val_2.fq.gz (version: v0.22.3) Bismark was run with Bowtie 2 against the bisulfite genome of /home/yli/ref/genome/female.mm10/ with the specified options: -q --score-min L,0,-0.2 --ignore-quals --no-mixed --no-discordant --dovetail --maxins 600 Option '--directional' specified (default mode): alignments to complementary strands (CTOT, CTOB) were ignored (i.e. not performed)

Final Alignment report

Sequence pairs analysed in total: 11968824 Number of paired-end alignments with a unique best hit: 8928772 Mapping efficiency: 74.6% Sequence pairs with no alignments under any condition: 1428756 Sequence pairs did not map uniquely: 1611296 Sequence pairs which were discarded because genomic sequence could not be extracted: 2

Number of sequence pairs with unique best (first) alignment came from the bowtie output: CT/GA/CT: 4474195 ((converted) top strand) GA/CT/CT: 0 (complementary to (converted) top strand) GA/CT/GA: 0 (complementary to (converted) bottom strand) CT/GA/GA: 4454575 ((converted) bottom strand) ... ...

FelixKrueger commented 3 years ago

Hi @Yuxinagli

As a general statement, I would say that both commands should result in the very same output, since your command 1 only specifies exactly that the default mode would set anyway. Let's have a look at the elements of your command:

bismark --bowtie2 --score_min L,0,−0.2 --gzip -X 600 -un --parallel 12 --bam /home/yli/ref/genome/female.mm10 -1 $sample_R1_val_1_val_1.fq.gz -2 $sample_R2_val_2_val_2.fq.gz -o ./${sample}_bismark_bowtie_mm10_score_min

Until here, most is fine and should not have any impact on the results. Two things I find in triguing:

Let me know how you get on. Cheers, Felix

Yuxinagli commented 3 years ago

Thank you very much for the reply, I wil try to reduce the parallel threads. For the trimming, It was performed per recommendation of library preparation kit provider, the first step to trim the adaptor, the second step is to hard trim 15 bp synthetic tail during Adaptase step.

Thanks,

Yuxiang

FelixKrueger commented 3 years ago

I hope this will help... Regarding the adaptase, is it the Accel Swift kit? Since the adaptase alters the 5' end, and the adapter contamination is on the 3' end, you should be able to do both in a single step, e.g.:

trim_galore --clip_r1 15 --clip_r2 15 --paired file_R1.fastq.gz file_R2.fastq.gz

Good luck!

Yuxinagli commented 3 years ago

Hi, Felix,

Thank you for your help. You are right, it's the Swift kit. I will try your recommendation to simplify the trimming step. The issue was persistent after I changed the " --parallel" setting.  After scrutinization I found the minus symbol in  command 1 is "−" but "-". The bismark takes it as "-" at the initiation step but not while running the mapping score check. So I got this warning "Argument "▒M-^HM-^R0.2" isn't numeric in multiplication (*) at /home/yli/src/Bismark/bismark line 3806, <__ANONIO__> line 31." in the standard output. And this warning didn't affect the results report, which I just missed before. Thank you again for your kind help.

Thanks, Yuxiang

FelixKrueger commented 3 years ago

Oh wow, this is really tricky to spot! I suppose we could try to add a test for the - symbol, but I am actually suprised that Bowtie2 doesn't complain about the strange dash you used instead of a minus sign... In any case, I'm glad it seems to be working now!