Closed gokalpyildirir closed 1 year ago
Hi Goko, You should be able to add the option --min-mapq 0
to work around this.
Thank you for the suggestion @ctsa ! By doing that, I'm afraid I will introduce false positives. I imagine most of the MAPQ=0 reads are coming from identical regions between haplotypes since I am mapping the reads onto both haplotypes at the same time. If I force pb-cpg-tools to analyze these reads, it would produce the methylation frequencies but I still couldn't tell if these methylations belong to one haplotype or the other. That is why I think I need to modify my pipeline, maybe starting from the mapping step?
Hi @gokalpyildirir , Thanks for explaining this in more detail. If the reads will not uniquely map to one of the two haplotypes, it would seem that you won't be able to use them for this analysis. Perhaps within the ambiguous regions you could also find regions with bimodal methylation that clustered into two methylation haplotypes, and attempt to assign these to the corresponding parental haplotypes.
Thanks for the input! This is almost like trying to differentiate two faces by only focusing on the similar features... I will now try to map my reads onto one haplotype only, go through the haplotagging process and see what happens. Take care!
Hello,
I'm currently working on a project involving a heterokaryotic fungus that contains two known parental haplotypes within the same strain. I'm working on studying the differential methylation frequencies on these haplotypes using HiFi reads with kinetic information. I'm seeking suggestions for a pipeline that would be suitable for this analysis.
Here's an overview of my current pipeline:
However, I have encountered an issue during the mapping step. Upon examining IGV, I noticed that some alignments are displayed as white reads (MAPQ=0). These reads map to multiple locations with same scores, making it impossible to determine their actual origin. These regions typically consist of highly repetitive regions or areas with shared genes between the haplotypes. As a result, pb-CpG-tools disregards these reads and does not calculate the final methylation frequency for these regions. This poses a challenge for further analysis, as our primary goal is to compare the two haplotypes, and the majority of methylation events are observed in these repetitive elements. I need to obtain information on these repetitive regions.
I would greatly appreciate your feedback on my pipeline and any suggestions for modifications. Any assistance you can provide would be highly valuable.
Thank you for your support!
Best, Goko