biocommons / hgvs

Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`
https://hgvs.readthedocs.io/
Apache License 2.0
240 stars 94 forks source link

Liftover hgvs nomenclature from hg19 to hg38 #703

Closed vlakhujani closed 12 months ago

vlakhujani commented 12 months ago

I am wondering if there is any way to liftover the HGVS from hg19 to hg38 ?

I am interested in lifting over protein(p.) and genomic(g.) nomenclatures using this package. I went through the documentation, but could not find what I am looking for. If someone can guide me to an example or link, it will be great.

Below are few example HGVS (hg19) where I want to liftover the coords.

p.Arg379Cys
chr10:g.100017532G>A
c.1135C>T | Arg379Cys
davmlaw commented 12 months ago

Hi, for next time, this is a question not a bug report, so would be better suited to a forum like SeqAnswers, BioStars or Bioinformatics Stack Exchange

Lifting over using the HGVS library will probably be slow. You may want to use other dedicated liftover tools (google for them) eg ClinGen Allele Registry UCSC liftover, though please read this UCSC FAQ on snp liftover

I am not sure what you mean by lifting over proteins - they should be the same for each build if the build has the same transcripts available.

But if you really want to do it, here's some example code:

from bioutils.assemblies import make_name_ac_map
from hgvs.assemblymapper import AssemblyMapper
from hgvs.dataproviders.uta import connect
from hgvs.exceptions import HGVSError
from hgvs.parser import Parser

# You appear to be using chromosome names instead of contigs, so here's a quick way to convert them

class ChromContigConverter:
    def __init__(self, genome_build):
        self.name_ac_map = make_name_ac_map(genome_build)

    def chrom_to_contig(self, chrom):
        chrom = chrom.lower()
        if chrom.startswith("chr"):
            chrom = chrom[3:]
        return self.name_ac_map.get(chrom)

def liftover(var_g, src_am, dest_am):
    alignments_for_region = hdp.get_alignments_for_region(var_g.ac, var_g.posedit.pos.start.base, var_g.posedit.pos.end.base)
    for afr in alignments_for_region:
        try:
            tx_ac = afr["tx_ac"] # If you want to handle proteins you'd use the tx_ac from here
            var_c = src_am.g_to_c(var_g, tx_ac)
            return dest_am.c_to_g(var_c)
        except HGVSError:
            pass

    raise HGVSError(f"Could not convert {var_g} to {dest_am.assembly_name}")

hdp = connect()

am37 = AssemblyMapper(hdp,
                      assembly_name='GRCh37',
                      alt_aln_method='splign')

am38 = AssemblyMapper(hdp,
                      assembly_name='GRCh38',
                      alt_aln_method='splign')

hp = Parser()
chrom_converter = ChromContigConverter("GRCh37")

var_g = hp.parse_hgvs_variant("chr10:g.100017532G>A")
var_g.ac = chrom_converter.chrom_to_contig(var_g.ac)

var_g_38 = liftover(var_g, am37, am38)

print(f"Lifted over {var_g} => {var_g_38}")        

Output:

Lifted over NC_000010.10:g.100017532G>A => NC_000010.11:g.98257775G>A

Good luck!