FelixKrueger / Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states
http://felixkrueger.github.io/Bismark/
GNU General Public License v3.0
381 stars 101 forks source link

Getting strand information from/for bedGraph/coverage files #700

Open jonasbucherETH opened 3 days ago

jonasbucherETH commented 3 days ago

Is there a straightforward way to get strand information for the bedGraph or coverage files produced by Bismark, without generating a genome-wide cytosine report? I would like to avoid that for runtime reasons, as I don't need the genome-wide report for further analysis. I have been trying to figure out a way with bedtools by using the respective bam file, but I was wondering if there is a more simple solution to this.

FelixKrueger commented 3 days ago

coverage2cytosine associates strand information with the positions reported in the coverage file by going through the entire genome sequence, scanning it for Cs (in CpG context in the default mode) and testing if that position was covered in the the coverage file. This approach is reference-based and produces equivalent output between different runs (unless you use a coverage threshold > 0).

Another option could be to scan the coverage file directly, and check if a reported position is a C (+ strand) or G (- strand), however there is currently no option go down this route (it might be a fairly straight forward task for ChatGPT though?). The downside of such an approach is that the output file is different for each sample, and since it is not strictly reference genome based it doesn't have a cytosine context check in place; as a result you may include a certain number of context differences in the output file.