PacificBiosciences / HiPhase

Small variant, structural variant, and short tandem repeat phasing tool for PacBio HiFi reads
Other
62 stars 4 forks source link

can the three output vcf directly concatenate? #41

Closed WeiCSong closed 1 month ago

WeiCSong commented 1 month ago

Hi, i'm wondering whether the phasing of snv, sv and tr are consistent so that they can directly be concatenated? for example, if variants chr1_1_SNV1, chr1_100_SV1, and chr1_1000_TR1 all have genotype 1|0 in their own vcf, after concatenation it will appear that chr1_1_SNV1, chr1_100_SV1, and chr1_1000_TR1 are on the same haplotype, is this always correct? Yhanks for your help.

Best

holtjma commented 1 month ago

Hi @WeiCSong,

HiPhase will output the same number of VCF files as are provided, primarily for ease-of-use and organization. However, internally there is no distinction based on filename, and all variants are treated as if it was one big variant call file.

In the outputs, the key piece of information is the phase set (PS tag). As long as the PS tag is shared (even across files), then the phase information between variants can be compared. So in your example with chr1_1_SNV1, chr1_100_SV1, and chr1_1000_TR1: as long as the PS tag is identical, then your interpretation is correct. If the PS tags are different, then no assumptions should be made about their relative phase.

Matt

WeiCSong commented 1 month ago

thanks for the information!