biocommons / hgvs

Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`
https://hgvs.readthedocs.io/
Apache License 2.0
233 stars 94 forks source link

For start loss AARefAlt.format always returns 3 letter amino acid even when single letter amino acid is configured #730

Closed jPleyte closed 3 months ago

jPleyte commented 3 months ago

When working with a start loss variant and using the AssemblyMapper to obtain the p. genotype the return value is hard coded to return the tree letter amino acid value "Met1?" even when configured to return changes using single letter amino acids.

To Reproduce

am = hgvs.assemblymapper.AssemblyMapper(hdp, assembly_name=ASSEMBLY_VERSION, alt_aln_method='splign')

var_g = hgvs_parser.parse_hgvs_variant('NC_000016.9:g.89985662_89985667del')
var_c = am.g_to_c(var_g, str('NM_002386.3'))
var_p = am.c_to_p(var_c)
var_p_one_letter = var_p.format(conf={"p_3_letter": False})
var_p_three_letter = var_p.format(conf={"p_3_letter": True})

print(var_p_one_letter)    # NP_002377.4:p.Met1?
print(var_p_three_letter)  # NP_002377.4:p.Met1?

Expected behavior When the SequenceVariant is configured to use single letter amino acids then the call to var_p.format should return a single letter amino acid representation of the change.

Additional context The fix for this issue will be fairly simple. In edit.py we just need to check the configuration and return "p.Met?" or "p.M?".