PacificBiosciences / pb-CpG-tools

Collection of tools for the analysis of CpG data
BSD 3-Clause Clear License
74 stars 6 forks source link

how to prepare input data for phasing 5mc methylation analysis? #43

Closed oceancongliu closed 1 year ago

oceancongliu commented 1 year ago

hi, Thank your great tool. I have a dataset from Pacbio seq iie platform. I used this ccs reads for genome assembly (diploid), and phasing to paternal and maternal genome (trio assembly with hifiasm,default parameters) using illumina sequcing data (PE150) of parent. So,my question is that wether pb-CpG-tools can phasing 5mc methylation data for differant methylation analysis of two Haplotype genome? From your description, we can set --hap-tag parameter to identify haplotypes , but , i have donnot konw how to prepare this format file (SAM tag). Can you offer a example of this file ,which better for user to perform this tool. thanks.

ctsa commented 1 year ago

We describe a bit more detail on what's expected at the end of this section:

https://github.com/PacificBiosciences/pb-CpG-tools#output-modes-and-option-details

Using the --hap-tag flag allows an arbitrary SAM tag to be used to identify haplotypes, rather than the default HP tag. The haplotype values must be 0, 1, and 2, where 0 is not assigned/ambiguous.

The phasing tool is primarily tested on bams that have been run through whatshap haplotag. That tool will add HP fields to the bam records such as HP:i:1 or HP:i:2 to annotate phasing of individual reads.

If you select a different tag to use for the haplotype information, it will still need to follow the format of providing only 0, 1 or 2 as indicators.

oceancongliu commented 1 year ago

Thank you for your reply,it is a good idea, and it's worth trying.

oceancongliu commented 1 year ago

I am sorry for reopen this question again, As your comment, I uesed whatshap haplotag to tag my BAM file successfully. And, pb-CpG-tools was performed to analysize methylation, generated xxx.cpg.hap1.bed and xxx.cpg.hap2.bed. So, I want to konw which one belong to the paternal or maternal? How to do that? Thanks.

ctsa commented 1 year ago

Hi @oceancongliu, The methylation pileup tool is able to take HP read tags into account to create the corresponding hap1 and hap2 bed file outputs. but it is not a phasing or read haplotype assignment tool itself. You'll need to work with other available tools to get the BAM phasing tags setup the way you'd prefer. You might be able to get this done directly with your assembler's toolchains or you might be able to use something like pedigree phasing in whatshap (https://whatshap.readthedocs.io/en/latest/guide.html#phasing-pedigrees) to get these tags arranged such that they consistently refer to maternal/paternal haplotypes.

oceancongliu commented 1 year ago

Thanks,I have contact with the author of whatshap and hope for answer. I suggestion that the PacBio can development this tool like whatshap to phasing ccs hifi reads, it will be a big progress for data mining. Thank you again for your help, and hope you have a good day.

ctsa commented 1 year ago

We do have a HiFi phasing tool under development here:

https://github.com/PacificBiosciences/HiPhase

Although it is quite capable already it doesn't have any pedigree phasing ability so for your specific case here I don't think it would be helpful. Feel free to give it a try for diploid read-backed phasing problems though.