Closed demis001 closed 1 year ago
Each row in the coverage file is a single position that was called as a CpG (single C resolution). If you wanted to merge the top and bottom strand Cs of a CpG dinucleotide, and relative back to the genome you can run coverage2cytosine --merge_CpG ...
Does this out similar information "*cov.gz" file with a count of methylated and unmethylated reads but merged for both strands? What I am looking for is a count that summarizes each row as a CpG. Instead of C and G separate.
Yes, it will (use --help
for more details):
genome-wide CpG report (old)
gi|9626372|ref|NC_001422.1| 157 + 313 156 CG
gi|9626372|ref|NC_001422.1| 158 - 335 156 CG
merged CpG evidence coverage file (new)
gi|9626372|ref|NC_001422.1| 157 158 67.500000 648 312
I will let you know after the test run is complete, the idea is to run multivariate analysis in the package like DSS and bsseq.
mkdir merged_coverage coverage2cytosine --merge_CpG --gzip --output merged_coverage --genome_folder /datamain/genome/hg38_r109/bismarkindx 184_S52_L003_R1_001_val_1_bismark_bt2_pe.deduplicated.bam
Dereje
I don't see the multi-tread option. Is this a single tread? I have 100 sampels
It also shows a lot of error while running:
Use of uninitialized value within %chromosomes in pattern match (m//) at /home/ddjimamain/bin/Bismark-0.24.1/coverage2cytosine line 239,
The input for coverage2cytosine
is a coverage file (cov.gz), not a BAM file.
@FelixKrueger
Is there an easy way to represent a CpG with a single row in the "*.cov.gz " file for paired-end data?
Best, @demis001