PacificBiosciences / pb-CpG-tools

Collection of tools for the analysis of CpG data
BSD 3-Clause Clear License
74 stars 6 forks source link

model-based pileup %methylation is independent of the number of reads #1

Closed rhallPB closed 2 years ago

rhallPB commented 2 years ago

Carryover issue:

Because of the model-based pileup, the % methylation is no longer strictly correlated with the number of methylated reads. For example, the first line below has 4x coverage but not 75% or 100% of methylation (having 3 or 4 reads methylated). Some downstream processing methods like MethylSeekR use the number of methylated and unmethylated read counts to find UMRs. Is it possible to also output this information or it will be confusing? Or maybe outputting both frequency-based and model-based % methylation.

B73_chr1 5295 5296 94.5 merged Total 4

rhallPB commented 2 years ago

Note, it is currently possible to run the pileup twice to generate the model-based, and strict count numbers. My preferred solution would be to add an option to round the % methylation by the sampling frequency of the number of reads and add as either the only % methylation, or as another column in the bed. @amwenger @dportik

dportik commented 2 years ago

Features added in in https://github.com/PacificBiosciences/pb-CpG-tools/pull/2