biocommons / hgvs

Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`
https://hgvs.readthedocs.io/
Apache License 2.0
240 stars 94 forks source link

Problems hgvs between build 37 and 38 #718

Closed jeantristanb closed 8 months ago

jeantristanb commented 8 months ago

Dear developpers, maybe my question is stupid and not understand everything, I have list of c and want to convert in g in two build using c_to_g, but when I'm using : vm = hgvs.assemblymapper.AssemblyMapper( hdp, assembly_name=ref) with ref = 'GRCh37', that work, with ref ='GRCh38' same Transcript variant, that don't work, for instance :

varchr='NM_004360.3:c.1214A>G' var = hgvsparser.parse_hgvs_variant(varchr)

Traceback (most recent call last): File "", line 1, in File "/home/jeantristanb/Dropbox/SBIMB/WindowsWork/Tabitha/data_tybergerg/formthgvs_poschr/venv/lib/python3.11/site-packages/hgvs/assemblymapper.py", line 114, in c_to_g alt_ac = self._alt_ac_for_tx_ac(var_c.ac) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jeantristanb/Dropbox/SBIMB/WindowsWork/Tabitha/data_tybergerg/formthgvs_poschr/venv/lib/python3.11/site-packages/hgvs/assemblymapper.py", line 178, in _alt_ac_for_tx_ac raise HGVSDataNotAvailableError( hgvs.exceptions.HGVSDataNotAvailableError: No alignments for NM_004360.3 in GRCh38 using splign

vm.c_to_g(normalizer.normalize(var))

I imagined I need to convert between build my transcript? Or I miss something? I'm just beginning to work on the problem and don't understand everything

thank you

andreasprlic commented 8 months ago

UTA does not contain an alignment for that specific transcript minor version on that reference assembly. You could try with one minor version up? Looks like NM_004360.4 is available for both NC_000016.9 and NC_000016.10.

jeantristanb commented 8 months ago

Thank you, there is no way to do a conversion? in hgvs?

davmlaw commented 8 months ago

Hi, an alternative is to use cdot as a transcript provider instead of UTA

import hgvs
from hgvs.assemblymapper import AssemblyMapper
from cdot.hgvs.dataproviders import JSONDataProvider, RESTDataProvider

hdp = RESTDataProvider()  # Uses API server at cdot.cc

am_37 = AssemblyMapper(hdp, assembly_name='GRCh37', alt_aln_method='splign', replace_reference=True)
am_38 = AssemblyMapper(hdp, assembly_name='GRCh38', alt_aln_method='splign', replace_reference=True)

hp = hgvs.parser.Parser()
var_c = hp.parse_hgvs_variant('NM_004360.3:c.1214A>G')

var_g_37 = am_37.c_to_g(var_c)
var_g_38 = am_38.c_to_g(var_c)

print(f"{var_c} => {var_g_37} (37)")
print(f"{var_c} => {var_g_38} (38)")

Output

NM_004360.3:c.1214A>G => NC_000016.9:g.68847292A>G (37)
NM_004360.3:c.1214A>G => NC_000016.10:g.68813389A>G (38)
jeantristanb commented 8 months ago

Thank you, work very well.