FelixKrueger / Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states
http://felixkrueger.github.io/Bismark/
GNU General Public License v3.0
366 stars 101 forks source link

How to extract methylation levels according to bam file #678

Closed myoucchent closed 1 week ago

myoucchent commented 1 week ago

I used the bismark_methylation_extractor command to extract methylation levels. The command line is as follows: bismark_methylation_extractor -p --ignore 2 --ignore_r2 2 --comprehensive --no_overlap --bedGraph --counts --cytosine_report --report --parallel 4--buffer_size 10G --genome_folder genome.fasta clean_bismark_bt2_pe.deduplicated.bam -o extra I obtained the result files as shown below: 1 I hope to get a file for calculating methylation levels and plotting (which can be imported into IGV), but I don't know how to handle it. Therefore, I used this script with the following command line: coverage2cytosine --genome_folder genome -o coverage2cytosine.output SQ102_clean_1_bismark_bt2_pe.deduplicated.bedGraph.gz The error message is as follows: Use of uninitialized value $meth in addition (+) at /yufan_NAS/basic_NAS_14T/chent/software/Bismark-0.24.2/coverage2cytosine line 1915, line 5365298. Use of uninitialized value $nonmeth in addition (+) at /yufan_NAS/basic_NAS_14T/chent/software/Bismark-0.24.2/coverage2cytosine line 1916, line 5365298. Use of uninitialized value $meth in join or string at /yufan_NAS/basic_NAS_14T/chent/software/Bismark-0.24.2/coverage2cytosine line 400, line 5365298. Use of uninitialized value $nonmeth in join or string at /yufan_NAS/basic_NAS_14T/chent/software/Bismark-0.24.2/coverage2cytosine line 400, line 5365298. Use of uninitialized value $nonmeth in addition (+) at /yufan_NAS/basic_NAS_14T/chent/software/Bismark-0.24.2/coverage2cytosine line 338, line 5365298. I want to know why the error is reported and how should I analyze it. Thank you so much!

FelixKrueger commented 1 week ago

coverage2cytsine takes the .cov.gz file as input, the bedGraph file doesn't have any count information.

myoucchent commented 1 week ago

coverage2cytsine takes the .cov.gz file as input, the bedGraph file doesn't have any count information.

Thanks you so much! Thank you for your help. I ran the following command: coverage2cytosine --genome_folder ../../genome -o coverage2cytosine.output SQ102_clean_1_bismark_bt2_pe.deduplicated.bismark.cov.gz This command produced a file as: 2 What confuses me is why this file contains only CG type methylation and not the other two types of methylation. I checked all the lines, and they only contain CG type methylation. Thank you again for your help!!!!

FelixKrueger commented 1 week ago

CG context is the default, all contexts may be enable via the methylation extractor + bedGraph + cytosine report. Just type --help for more info.

--CX/--CX_context        The sorted bedGraph output file contains information on every single cytosine that was covered
                         in the experiment irrespective of its sequence context. This applies to both forward and
                         reverse strands. Please be aware that this option may generate large temporary and output files
                         and may take a long time to sort (up to many hours). Default: OFF.
                         (i.e. Default = CpG context only).
myoucchent commented 1 week ago

coverage2cytosine

CG context is the default, all contexts may be enable via the methylation extractor + bedGraph + cytosine report. Just type --help for more info.

--CX/--CX_context        The sorted bedGraph output file contains information on every single cytosine that was covered
                         in the experiment irrespective of its sequence context. This applies to both forward and
                         reverse strands. Please be aware that this option may generate large temporary and output files
                         and may take a long time to sort (up to many hours). Default: OFF.
                         (i.e. Default = CpG context only).

Thank you so much!!!!

myoucchent commented 1 week ago

CG context is the default, all contexts may be enable via the methylation extractor + bedGraph + cytosine report. Just type --help for more info.

--CX/--CX_context        The sorted bedGraph output file contains information on every single cytosine that was covered
                         in the experiment irrespective of its sequence context. This applies to both forward and
                         reverse strands. Please be aware that this option may generate large temporary and output files
                         and may take a long time to sort (up to many hours). Default: OFF.
                         (i.e. Default = CpG context only).

CG context is the default, all contexts may be enable via the methylation extractor + bedGraph + cytosine report. Just type --help for more info.

--CX/--CX_context        The sorted bedGraph output file contains information on every single cytosine that was covered
                         in the experiment irrespective of its sequence context. This applies to both forward and
                         reverse strands. Please be aware that this option may generate large temporary and output files
                         and may take a long time to sort (up to many hours). Default: OFF.
                         (i.e. Default = CpG context only).

If I run the first step command is as follows bismark_methylation_extractor -p --ignore 2 --ignore_r2 2 --comprehensive --no_overlap --bedGraph --counts --cytosine_report --report --parallel 4--buffer_size 10G --genome_folder genome.fasta clean_bismark_bt2_pe.deduplicated.bam -o extra The second step is as follows coverage2cytosine --genome_folder .. /.. /genome -o coverage2cytosine.output SQ102_clean_1_bismark_bt2_pe.deduplicated.bismark.cov.gz --CX Do these two steps produce the same result as the following command? bismark_methylation_extractor -p --ignore 2 --ignore_r2 2 --gzip --bedGraph --buffer_size 10G --parallel 20 --CX_context --cytosine_report --report --comprehensive --counts --genome_folder genome SQ102_clean_1_bismark_bt2_pe.deduplicated.bam

FelixKrueger commented 1 week ago

You can run bismark_methylation_extractor: stops at C* methylation all files, or

bismark_methylation_extractor --bedGraph: stops at coverage file (cov.gz), or

bismark_methylation_extractor --bedGraph --cytosine ...: stops at cytosine report.

Alternatively you can run the individual steps via bismark2bedgraph and then coverage2cytosine in a mix & match fashion. All come with --help for full explanations.