Closed hangsuUNC closed 3 months ago
Hi Hang Su,
There may be a couple issues here, so I'd like to clarify some things:
Let me know if you have any follow up questions, Matt
Hi,
For part 1: Do you recommend simply switching all input GTs from phased |
to unphased /
? Will any of the variants be unused by Hiphase? If so, can we assume that any remaining /
GTs in the output VCF are not useful?
For part 2: Are all of the incompatible variants considered equally well supported by the reads? How would you recommend chaining together a single haplotype for each phase in the output of HiPhase?
Matt
Going to close this for now, feel free to re-open if there are additional related questions / clarifications!
Hi Matt,
I have a question about the phasing of multiallelic sites. Hiphase is capable to phase multiallelic sites. Does this mean it address the consistency of the final haplotypes, say using bcftools consensus there will be no overlapping variants in a single haplotype?
In the past a few months, we are trying to jointly phase small variants and SVs in a cohort. We are using integrated SV calls and a DeepVariant callset as inputs. We found that Hiphase generate amount of inconsistent haplotypes, i.e. overlapping variants in a single haplotype based on phasing. For example, before hiphase:
`chr1 100326507 chr1-100326508-INS-72 T TTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCTTTCTTTCCTTCTTTCTTTCCTTCC 1 PASS AD_ALL=4;AD_NON_ALT=6;CALIBRATION_SENSITIVITY=0.3791;CollapseId=6470.5;GQ=48;HOM_REF=0,39;HOM_TIG=0,47;ID=chr1-100326508-INS-72;NumCollapsed=8;NumConsolidated=13;QUERY_STRAND=-;SCORE=0.8911;SQ=89;SUPP_PAV=1;SUPP_PBSV=1;SUPP_SNIFFLES=1;SVLEN=72;SVTYPE=INS;TIG_REGION=h1tg011111l:37346-37417;calibration;extracted;training;DP=98;AC=1;AN=2 GT:SQ:GQ:PG:DP:AD:ZS:SS:SCORE:CALIBRATION_SENSITIVITY:SUPP_PBSV:SUPP_SNIFFLES:SUPP_PAV 1|0:58:58:1001:10:7,3:100,100:98,95:0.8225:0.6342:.:.:1
chr1 100326507 chr1-100326508-INS-76 T TTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCTTTCTTTCCTTCTTTCTTTCCTTCCTTCC 1 PASS AD_ALL=2;AD_NON_ALT=0;CALIBRATION_SENSITIVITY=0.3087;CollapseId=6470.3;GQ=5;HOM_REF=0,39;HOM_TIG=0,47;ID=chr1-100326508-INS-76;NumCollapsed=6;NumConsolidated=10;QUERY_STRAND=+;SCORE=0.9063;SQ=61;SUPP_PAV=1;SUPP_PBSV=1;SUPP_SNIFFLES=1;SVLEN=76;SVTYPE=INS;TIG_REGION=h1tg023887l:3682-3757;DP=128;AC=1;AN=2 GT:SQ:GQ:PG:DP:AD:ZS:SS:SCORE:CALIBRATION_SENSITIVITY:SUPP_PBSV:SUPP_SNIFFLES:SUPP_PAV 0|1:100:12:1001:10:3,7:100,100:98,95:0.8626:0.4657:1:1:1`
After Hiphase:
chr1 100326507 chr1-100326508-INS-72 T TTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCTTTCTTTCCTTCTTTCTTTCCTTCC 1 PASS AD_ALL=4;AD_NON_ALT=6;CALIBRATION_SENSITIVITY=0.3791;CollapseId=6470.5;GQ=48;HOM_REF=0,39;HOM_TIG=0,47;ID=chr1-100326508-INS-72;NumCollapsed=8;NumConsolidated=13;QUERY_STRAND=-;SCORE=0.8911;SQ=89;SUPP_PAV=1;SUPP_PBSV=1;SUPP_SNIFFLES=1;SVLEN=72;SVTYPE=INS;TIG_REGION=h1tg011111l:37346-37417;calibration;extracted;training;DP=105252;AN=2;AC=1 GT:SQ:GQ:PG:DP:AD:ZS:SS:SCORE:CALIBRATION_SENSITIVITY:SUPP_PBSV:SUPP_SNIFFLES:SUPP_PAV:PS 1|0:58:58:1001:10:7,3:100,100:98,95:0.8225:0.6342:.:.:1:100000723
chr1 100326507 chr1-100326508-INS-76 T TTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCCTTCTTTCTTTCCTTCTTTCTTTCCTTCCTTCC 1 PASS AD_ALL=2;AD_NON_ALT=0;CALIBRATION_SENSITIVITY=0.3087;CollapseId=6470.3;GQ=5;HOM_REF=0,39;HOM_TIG=0,47;ID=chr1-100326508-INS-76;NumCollapsed=6;NumConsolidated=10;QUERY_STRAND=+;SCORE=0.9063;SQ=61;SUPP_PAV=1;SUPP_PBSV=1;SUPP_SNIFFLES=1;SVLEN=76;SVTYPE=INS;TIG_REGION=h1tg023887l:3682-3757;DP=137472;AN=2;AC=1 GT:SQ:GQ:PG:DP:AD:ZS:SS:SCORE:CALIBRATION_SENSITIVITY:SUPP_PBSV:SUPP_SNIFFLES:SUPP_PAV:PS 1|0:100:12:1001:10:3,7:100,100:98,95:0.8626:0.4657:1:1:1:100000723
These two alternative alleles are phased into the same haplotype. Is this a behavior that expected by Hiphase? if not, can you provide some suggestions how to deal with these inconsistent phasing results?
Thanks,
Hang Su