biocommons / hgvs

Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`
https://hgvs.readthedocs.io/
Apache License 2.0
240 stars 94 forks source link

Is it possible to get hgvs official name from "15-49716528-A-G"? #743

Closed shizipo closed 1 month ago

shizipo commented 3 months ago

like "https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg19/15-49716528-A-G/mane_select/" ?

get "NC_000015.9(NM_001330293.1):c.911-1617T>C"

jsstevenson commented 3 months ago

Hey @shizipo!

I don't think this is possible with out-of-the-box with HGVS. However, the VRS-Python library (which is biocommons-adjacent) does include a translator module that can ingest gnomAD-style variation descriptions and output them as HGVS strings (making use of the HGVS library under the hood). See https://github.com/ga4gh/vrs-python/blob/main/notebooks/getting_started/4_Exploring_the_AlleleTranslator.ipynb for more.

code snippet from @korikuzma

from biocommons.seqrepo import SeqRepo
from ga4gh.vrs.extras.translator import AlleleTranslator
from ga4gh.vrs.dataproxy import SeqRepoDataProxy

sr = SeqRepo(root_dir="/usr/local/share/seqrepo/latest")
seqrepo_dataproxy = SeqRepoDataProxy(sr)
allele_translator = AlleleTranslator(data_proxy=seqrepo_dataproxy)

gnomad_vcf = "15-49716528-A-G"
vo = allele_translator.translate_from(gnomad_vcf)
print(allele_translator.translate_to(vo, "hgvs"))
# ['NC_000015.10:g.49716528A>G']
davmlaw commented 1 month ago

Hi, it's reasonably straight forward to convert this into a g.HGVS:

from hgvs.dataproviders import uta
from hgvs.extras.babelfish import Babelfish

hdp = uta.connect()
bf = Babelfish(hdp, 'GRCh37')

gnomad_vcf = "15-49716528-A-G"
chrom, position, ref, alt = gnomad_vcf.split("-")
hgvs_g = bf.vcf_to_g_hgvs(chrom, int(position), ref, alt)
print(hgvs_g)

Output:

NC_000015.9:g.49716528A>G

The next step involves knowing what transcript is the MANE select. This is not currently available in HGVS, but I have raised an issue for it, see #747

You could do this yourself pretty easily though by downloading MANE CSV and comparing transcripts against that

Hope this answers your question