biothings / myvariant.info

MyVariant.info: A BioThings API for human variant annotations
http://myvariant.info
Other
87 stars 32 forks source link

Use a known normalization algorithm in `utils.hgvs._normalized_vcf` function #140

Open erikyao opened 2 years ago

erikyao commented 2 years ago

There are a few VCF normalization tools/algorithms that can be used in our _normalized_vcf function, e.g.:

Our current implementation fails to meet the Parsimony definition of VCF normalization:

A variant is parsimonious if and only if it is represented in as few nucleotides as possible without an allele of length 0.

E.g. our current implementation cannot handle cases like ref = "TCCCCT", alt = "CCCCT". Normalized sequences should be ref = 'TC', alt = 'C'.