PacificBiosciences / pb-CpG-tools

Collection of tools for the analysis of CpG data
BSD 3-Clause Clear License
62 stars 5 forks source link

difference betwwen -m denovo and reference #54

Closed Duyh814 closed 11 months ago

Duyh814 commented 12 months ago

@ctsa May I ask where the difference principle lies between denovo and reference? If the program is in denovo mode and finds that the coordinates corresponding to this CpG site do not exist in the reference genome, will it generate start and end coordinates based on this reference genome?

Duyh814 commented 11 months ago

@ctsa @dportik In other words,could methylated sites on non-reference sequences?i.e. fragments where the read and reference do not match, be obtained using -m denovo and -m reference?

ctsa commented 11 months ago

Hi @Duyh814 , Sorry for the delayed reply. Denovo mode only summarizes sites that can be mapped to a reference coordinate due to a SNP. In particular any kind of CpG occurring in an insertion or breakpoint insertion sequence will not be included. So long as we're writing the results in BED format this is unlikely to change. One way to query CpGs within insertions using this tool would be to map a sample back to its own assembly contigs and then run pb-CpG-tools on that bam -- obviously a big lift but the primary workaround that comes to mind.

Duyh814 commented 11 months ago

@ctsa Thank you very much for the detailed explanation on why insertions and breakpoint insertions are not summarized in the Denovo mode. I appreciate you taking the time to clarify the technical reasons behind this limitation.Thanks again for the insightful suggestions. Have a good time!