PengNi / ccsmeth

Detecting DNA methylation from PacBio CCS reads
BSD 3-Clause Clear License
71 stars 10 forks source link

confusion about the traditional ccsmeth procedure #29

Closed wzhang42 closed 1 year ago

wzhang42 commented 1 year ago

Hi, Peng, Thanks for your development tool ccsmeth. I am reading your paper "DNA 5-methylcytosine detection and methylation phasing using PacBio circular consensus sequencing" and playing the tool ccsmeth provided here. I am little confused about the traditional procedure about ccsmeth. In your paper, Fig.1 and Fig.4 clearly showed us it's subread that was used for alignment and calling methylation by ccsmeth. The CCS read(Hifi reads) are used for SNV calling and Haplotype phasing. This make sense, the kinetics related Pacbio BAM ' tag "ipd" and"pw" for raw subread are used for training and testing.
However, here, it seems that it is hifi reads (after CCS calling from subreads ) are used for ccsmeth. I am confused and want to confirm whether ccsmeth can apply both the subread and ccs/hifi reads for 5mCpG detecting. If so, what's the difference by the two approaches. You know the CCS/hifi reads from subreads, it is the average kinetic tags, "fi" , "ri", "fp", "rp" are recorded in the ccs.bam.
Additionally, I found that primrose --> pbmm--> pb-CpG-tools are the hifi-reads based on the methylation provided by PacBio. What's the advantages of ccsmeth if it can applied for hifi reads based methylation extraction. Thank you in advance. Wenchao

PengNi commented 1 year ago

Hi Wenchao, we have updated ccsmeth to only accept HiFi reads as input, and to only use the averaged kinetics as the features. Compared to primrose, ccsmeth has a different deep learning model.

Best, Peng

wzhang42 commented 1 year ago

Hi, Peng, thanks for your reply. I think that the original version of ccsmeth should be able apply to subread.bam while the current version only accept the Hifi read. Why not keep both? since your just published paper claimed the subread's kinetic feature as input. What's your evaluation conclusion for using Hifi_read's average kinetic feature vs using subread.bam kinetic feature (IPD an PW)? Whether using Hifi_read's average kinetics can demonstrate a better result? If so, what's the reason? Additionally, the average kinetic feature (fi, ri, fp,rp) in Hifi_reads.bam, can be missing for some CCS called reads at some case, how to handle this case?
Many thanks in advance