Yuxinagli commented 3 years ago

Hi Felix,

Thank you very mcuh for providing Bismark and help. My question is why I got different mapping ratio results with similar parameters. I ran two Bismark with same sample as follow: command 1: bismark --bowtie2 --score_min L,0,−0.2 --gzip -X 600 -un --parallel 12 --bam /home/yli/ref/genome/female.mm10 -1 $sample_R1_val_1_val_1.fq.gz -2 $sample_R2_val_2_val_2.fq.gz -o ./${sample}_bismark_bowtie_mm10_score_min command 2: bismark --bowtie2 --gzip -X 600 -un --parallel 12 --bam /home/yli/ref/genome/female.mm10 -1 $sample_R1_val_1_val_1.fq.gz -2 $sample_R2_val_2_val_2.fq.gz -o ./${sample}_bismark_bowtie_mm10_score_min

The only difference is I specified the " --score_min L,0,-0.2" in command1. Especially, from the *PE.report.txt files, I can see the bismark run under same parameters, but I am confused by the ~10% gap of mapping ratio difference between these two runs. Could you please help me with this?

Thanks,

Yuxiang Li

*PE.reprot.txt for command 1:

Bismark report for: wgbs_2_R1_val_1_val_1.fq.gz and wgbs_2_R2_val_2_val_2.fq.gz (version: v0.22.3) Bismark was run with Bowtie 2 against the bisulfite genome of /home/yli/ref/genome/female.mm10/ with the specified options: -q --score-min L,0,−0.2 --ignore-quals --no-mixed --no-discordant --dovetail --maxins 600 Option '--directional' specified (default mode): alignments to complementary strands (CTOT, CTOB) were ignored (i.e. not performed)

Final Alignment report

Sequence pairs analysed in total: 11968824 Number of paired-end alignments with a unique best hit: 7811622 Mapping efficiency: 65.3% Sequence pairs with no alignments under any condition: 3084750 Sequence pairs did not map uniquely: 1072452 Sequence pairs which were discarded because genomic sequence could not be extracted: 2

Number of sequence pairs with unique best (first) alignment came from the bowtie output: CT/GA/CT: 3907390 ((converted) top strand) GA/CT/CT: 0 (complementary to (converted) top strand) GA/CT/GA: 0 (complementary to (converted) bottom strand) CT/GA/GA: 3904230 ((converted) bottom strand) ... ...

*PE.report.txt for command 2:

Bismark report for: wgbs_2_R1_val_1_val_1.fq.gz and wgbs_2_R2_val_2_val_2.fq.gz (version: v0.22.3) Bismark was run with Bowtie 2 against the bisulfite genome of /home/yli/ref/genome/female.mm10/ with the specified options: -q --score-min L,0,-0.2 --ignore-quals --no-mixed --no-discordant --dovetail --maxins 600 Option '--directional' specified (default mode): alignments to complementary strands (CTOT, CTOB) were ignored (i.e. not performed)

Final Alignment report

Sequence pairs analysed in total: 11968824 Number of paired-end alignments with a unique best hit: 8928772 Mapping efficiency: 74.6% Sequence pairs with no alignments under any condition: 1428756 Sequence pairs did not map uniquely: 1611296 Sequence pairs which were discarded because genomic sequence could not be extracted: 2

Number of sequence pairs with unique best (first) alignment came from the bowtie output: CT/GA/CT: 4474195 ((converted) top strand) GA/CT/CT: 0 (complementary to (converted) top strand) GA/CT/GA: 0 (complementary to (converted) bottom strand) CT/GA/GA: 4454575 ((converted) bottom strand) ... ...

FelixKrueger commented 3 years ago

Hi @Yuxinagli

As a general statement, I would say that both commands should result in the very same output, since your command 1 only specifies exactly that the default mode would set anyway. Let's have a look at the elements of your command:

bismark --bowtie2 --score_min L,0,−0.2 --gzip -X 600 -un --parallel 12 --bam /home/yli/ref/genome/female.mm10 -1 $sample_R1_val_1_val_1.fq.gz -2 $sample_R2_val_2_val_2.fq.gz -o ./${sample}_bismark_bowtie_mm10_score_min

--bowtie2: default, just lose it
--score_min L,0,−0.2: default, not needed
--gzip: I think this just affects the size of temporary files, not needed unless you are struggling with (temporary disk space)
-X 600: probably not required as most BS-seq is ~50-200bp, but shouldn't hurt (alignments might take a little longer though)
-un: not relevant here
--bam: default, just lose it
/home/yli/ref/genome/female.mm10: genome. Fine
-o ./${sample}_bismark_bowtie_mm10_score_min: should make this output directory for result files.

Until here, most is fine and should not have any impact on the results. Two things I find in triguing:

-1 $sample_R1_val_1_val_1.fq.gz -2 $sample_R2_val_2_val_2.fq.gz: The files appear to have been trimmed with Trim Galore. But why twice? While this almost certainly isn't necessary (I'd be happy to assist with trimming questions if you had any), it shouldn't impact the results, either.
- --parallel 12: This is what could be a suspect of having an impact. It looks like you are running alignments to the mouse genome, so I would assume that you will at least need ~3 cores, and 13GB of RAM (there are some genome references around that could use even more RAM). So with --parallel 12 you will need 12 times these resources, so ~40 cores and > 150GB of RAM. Do you have that much available? In case there are any other processes running on the machine/node which would kill some of the alignment threads, it could look like the alignment streams are empty, but not as result of no alignments but instead because they were terminated by the OS. A solution to this problem could be to go down with the resource requirement, e.g. try --parallel 2 for a start?

Let me know how you get on. Cheers, Felix

Yuxinagli commented 3 years ago

Thank you very much for the reply, I wil try to reduce the parallel threads. For the trimming, It was performed per recommendation of library preparation kit provider, the first step to trim the adaptor, the second step is to hard trim 15 bp synthetic tail during Adaptase step.

Thanks,

Yuxiang

FelixKrueger commented 3 years ago

I hope this will help... Regarding the adaptase, is it the Accel Swift kit? Since the adaptase alters the 5' end, and the adapter contamination is on the 3' end, you should be able to do both in a single step, e.g.:

trim_galore --clip_r1 15 --clip_r2 15 --paired file_R1.fastq.gz file_R2.fastq.gz

Good luck!

Yuxinagli commented 3 years ago

Hi, Felix,

Thank you for your help. You are right, it's the Swift kit. I will try your recommendation to simplify the trimming step. The issue was persistent after I changed the " --parallel" setting. After scrutinization I found the minus symbol in command 1 is "−" but "-". The bismark takes it as "-" at the initiation step but not while running the mapping score check. So I got this warning "Argument "▒M-^HM-^R0.2" isn't numeric in multiplication (*) at /home/yli/src/Bismark/bismark line 3806, <__ANONIO__> line 31." in the standard output. And this warning didn't affect the results report, which I just missed before. Thank you again for your kind help.

Thanks, Yuxiang

FelixKrueger commented 3 years ago

Oh wow, this is really tricky to spot! I suppose we could try to add a test for the - symbol, but I am actually suprised that Bowtie2 doesn't complain about the strange dash you used instead of a minus sign... In any case, I'm glad it seems to be working now!

FelixKrueger / Bismark

about Bismark default run #409

Final Alignment report

Final Alignment report