More than 29 millions rows in the coverage file

FelixKrueger / Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states

http://felixkrueger.github.io/Bismark/

GNU General Public License v3.0

394 stars 103 forks source link

More than 29 millions rows in the coverage file #635

Closed demis001 closed 1 year ago

demis001 commented 1 year ago

@FelixKrueger

Is there an easy way to represent a CpG with a single row in the "*.cov.gz " file for paired-end data?

Best, @demis001

FelixKrueger commented 1 year ago

Each row in the coverage file is a single position that was called as a CpG (single C resolution). If you wanted to merge the top and bottom strand Cs of a CpG dinucleotide, and relative back to the genome you can run coverage2cytosine --merge_CpG ...

demis001 commented 1 year ago

Does this out similar information "*cov.gz" file with a count of methylated and unmethylated reads but merged for both strands? What I am looking for is a count that summarizes each row as a CpG. Instead of C and G separate.

FelixKrueger commented 1 year ago

Yes, it will (use --help for more details):

  genome-wide CpG report (old)
  gi|9626372|ref|NC_001422.1|     157     +       313     156     CG
  gi|9626372|ref|NC_001422.1|     158     -       335     156     CG
  merged CpG evidence coverage file (new)
  gi|9626372|ref|NC_001422.1|     157     158     67.500000       648     312

demis001 commented 1 year ago

I will let you know after the test run is complete, the idea is to run multivariate analysis in the package like DSS and bsseq.

mkdir merged_coverage coverage2cytosine --merge_CpG --gzip --output merged_coverage --genome_folder /datamain/genome/hg38_r109/bismarkindx 184_S52_L003_R1_001_val_1_bismark_bt2_pe.deduplicated.bam

Dereje

demis001 commented 1 year ago

I don't see the multi-tread option. Is this a single tread? I have 100 sampels

demis001 commented 1 year ago

It also shows a lot of error while running:

Use of uninitialized value within %chromosomes in pattern match (m//) at /home/ddjimamain/bin/Bismark-0.24.1/coverage2cytosine line 239, line 6840.

FelixKrueger commented 1 year ago

The input for coverage2cytosine is a coverage file (cov.gz), not a BAM file.