PacificBiosciences / HiPhase

Small variant, structural variant, and short tandem repeat phasing tool for PacBio HiFi reads
Other
69 stars 4 forks source link

One read has two HP tags #48

Closed ywzhang071394 closed 1 week ago

ywzhang071394 commented 1 week ago

Hi,

Thank you for the nice tool. We recently used HiPhase to phase our Pacbio data and encountered a wired issue. As shown in the below figure, one read with supplementary alignment has two HP tags. This only happened when we included pbsv SVs in phasing. The code is hiphase --bam Normal.s orted.bam --output-bam normal_haplotag_sv.bam --bam HT293_Tumor.sorted.bam --output-bam tumor_haplot ag_sv.bam --vcf normal_deepvariant.vcf.gz --output-vcf normal_deepvariant.phased.vcf.gz --vcf 0 3_pbsv/normal_pbsv.vcf.gz --output-vcf 0 3_pbsv/normal_pbsv_phased.vcf.gz --reference /data/human/t2t_chm13/chm13v2.0.fa --threads 2 --ignore-read-gro ups

when I removed the pbsv VCF, this read has the same HP tag. Why did this happen? Thank you!

image

holtjma commented 1 week ago

Hi @ywzhang071394,

HiPhase generates two key pieces of information for interpreting phase: the phase set (PS) and the haplotype (HP). If the phase sets are different, then the haplotypes are not comparable in any way. If they are the same, then the HP tags are comparable.

For supplementary alignments, they should all share the same HP if they are within the same phase set. If they are not in the same phase set, then the HP tag can be different.

In your example, the two alignments are very far apart (43Mbp and 67Mbp) and definitely reside in different phase blocks. So when you include pbsv, they by chance have different HP tags. Similarly, when you exclude pbsv, they by chance have the same HP tags.

Does that help clarify the issue?

Matt

ywzhang071394 commented 1 week ago

Thank you for the explanation! I would close this issue.