PengNi / ccsmeth

Detecting DNA methylation from PacBio CCS reads
BSD 3-Clause Clear License
66 stars 10 forks source link

how about cpg_tools of Pacbio #18

Closed hmyh1202 closed 2 years ago

hmyh1202 commented 2 years ago

Hello,

what is the different between Pacbio CpG tools and your ccsmeth software? And where is the models can I get when using --model_file /path/to/ccsmeth/models/model.ckpt

Thank you!

PengNi commented 2 years ago

Hi @hmyh1202 , thanks for your interest! pacbio_cpg_tools is great and has high performance. ccsmeth has a difference model architecture compared to primrose and pacbio_cpg_tools. However, at present we haven't trained a stable model of ccsmeth. I will release the model ASAP.

Best, Peng

hmyh1202 commented 2 years ago

Pacbio_cpg_tools only output very few CpG sites ~30M, the bismark for NGS 30x data can get >40M CpG site. So, what`s the number of your tools can got in your test ? Thank you!

------------------ 原始邮件 ------------------ 发件人: "PengNi/ccsmeth" @.>; 发送时间: 2022年6月20日(星期一) 下午2:46 @.>; @.**@.>; 主题: Re: [PengNi/ccsmeth] how about cpg_tools of Pacbio (Issue #18)

Hi @hmyh1202 , thanks for your interest! pacbio_cpg_tools is great and has high performance. ccsmeth has a difference model architecture compared to primrose and pacbio_cpg_tools. However, at present we haven't trained a stable model of ccsmeth. I will release the model ASAP.

Best, Peng

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

PengNi commented 2 years ago

We also only output 30M CpGs, and 30M is the number of all the CpGs in human when only CpGs in forward strand are counted. I think the reason that bismark outputs 40M CpGs, is that it treats the CpGs in reverse strand as different sites from the CpGs in forward strand, and outputs all of them if they are all covered by reads. However, in mammals, in most cases, the methylation status of cytosines at CpG on both DNA strands are symmetric( both methylated or both unmethylated). So it is ok only outputting CpGs in forward strand.

hmyh1202 commented 2 years ago

Tanks for your reply!

Yes, Bismark treats the CpGs  in reverse strand as different from the forward strands, both forward and reverse cytosines are reported. Because semi-methylation of CpG dinucleotide also a point to disscuss.

Another question, how can I caculate the methylation level of a region base on the 30M site?  In general, total mC count/(total mC count +total unC count )  is caculate for a region, or mC site num/(all C number of reference), so which method should select or any other method ?

Thank you.

------------------ 原始邮件 ------------------ 发件人: "PengNi/ccsmeth" @.>; 发送时间: 2022年6月20日(星期一) 下午3:18 @.>; @.**@.>; 主题: Re: [PengNi/ccsmeth] how about cpg_tools of Pacbio (Issue #18)

We also only output 30M CpGs, and 30M is the number of all the CpGs in human when only CpGs in forward strand are counted. I think the reason that bismark outputs 40M CpGs, is that it treats the CpGs in reverse strand as different sites from the CpGs in forward strand, and outputs all of them if they are all covered by reads. However, in mammals, in most cases, the methylation status of cytosines at CpG on both DNA strands are symmetric( both methylated or both unmethylated). So it is ok only outputting CpGs in forward strand.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

PengNi commented 2 years ago

From my point, as we can give a binary label (0/1) for each CpG at each read, I think the total mC count/(total mC count +total unC count ) at reads level is a common method for measuring methylation level for a region.

hmyh1202 commented 2 years ago

Fine, the 2st method is a mC density level. Thank you!