PacificBiosciences / HiPhase

Small variant, structural variant, and short tandem repeat phasing tool for PacBio HiFi reads
Other
71 stars 4 forks source link

[Question] information in the filter column of vcfs #33

Closed hangsuUNC closed 7 months ago

hangsuUNC commented 7 months ago

Hi Matt,

Hope you are well!

When parsing the vcf, does hiphase consider the information in the filter column of the vcf files? I'm wondering if only the variants labeled "PASS" or  "." are read into hiphase or if all the variants are taken into consideration.

In other words, when filtering the variants, should we delete all the low-quality records or is it possible to keep the records and annotate them as excluded?

Thanks,

Hang

holtjma commented 7 months ago

Hi Hang,

HiPhase does not look at the FILTER column currently, but it does filter on the GQ field if available; so in this case, it should read in "PASS", ".", or really anything as long as the VCF parser can parse it. If the GQ field does not work for what you want to remove, then I would recommend pre-filtering the variants in your VCF prior to calling HiPhase.

For reference, here's the function that checks if a variant is phase able or not, which is currently checking zygosity, GQ, and variant type: https://github.com/PacificBiosciences/HiPhase/blob/99b92ac47388347614bca1dc951a5c4086744558/src/block_gen.rs#L115

Matt

hangsuUNC commented 7 months ago

Thanks so much!