biocommons / hgvs

Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`
https://hgvs.readthedocs.io/
Apache License 2.0
245 stars 94 forks source link

bad p. for silent changes #317

Closed reece closed 7 years ago

reece commented 8 years ago

Originally reported by: Vivien Deshaies (Bitbucket: vdeshaies, GitHub: viv-1)


Hello,

It seems that there is a problem with p. notation in the module for silent changes.

Example

#!python
> variantmapper = hgvs.variantmapper.EasyVariantMapper(hdp, primary_assembly='GRCh37')

> str(variantmapper.c_to_p(hgvsparser.parse_hgvs_variant('NM_000059.3:c.7791A>G')))
'NP_000050.2:p.(=)'

According to hgvs recommendations this case should be reported as "NP_000050.2:p.Lys2597="

This problem is present in 0.4.4 version. Do you plan to fix this ? Do you want me to create a patch ?

Best regards,

Vivien


reece commented 8 years ago

Original comment by Reece Hart (Bitbucket: reece, GitHub: reece):


I've decided to close this issue since it solves the major issue. I opened issue #370 to handle the case of multicodon changes.

reece commented 8 years ago

Original comment by Reece Hart (Bitbucket: reece, GitHub: reece):


This issue is nearly complete. There's one corner case for us to consider. See

https://bitbucket.org/biocommons/hgvs/src/64b12c9dca154a735711f591948a947c400fc9d6/tests/test_issues.py?at=default&fileviewer=file-view-default#test_issues.py-118

The question is about what should happen when there's an MNV that spans multiple AAs. In that example, a dinucleotide MNV spans the 3rd and 1st positions of adjacent exons. Currently, hgvs translates this as p.(Gly7=), even though Arg8 was also changed. I believe this should read as p.(Arg7_Glu8=). If not that, then perhaps we should be returning p.(=) instead. (As Martijn correctly noted, p.(=) is valid, but I suspect we all prefer the specificity of the located variant forms.)

Comments?

reece commented 8 years ago

Original comment by Martijn Vermaat (Bitbucket: martijnvermaat, GitHub: martijnvermaat):


That page reads (emphasis mine):

Description of so called "silent" changes can be described using p.(Leu54=).

That's not the same as should. The page also reads:

p.(=) - protein has not been analysed, but no change is expected

Which I think applies to this case, so p.(=) would (also) be valid.

That said, HGVS on protein level is often confusing to me, and on the new website the wording is again different (I think the Protein / Substitution page applies).

By the way, p.Lys2597= should include parentheses (meaning not tested): p.(Lys2597=).

reece commented 8 years ago

Original comment by Reece Hart (Bitbucket: reece, GitHub: reece):


Hi Vivien- I always appreciate patches/PRs! Thanks, Reece

reece commented 8 years ago

Original comment by Vivien Deshaies (Bitbucket: vdeshaies, GitHub: viv-1):


Here is the link to the corresponding hgvs recommendation : http://www.hgvs.org/mutnomen/recs-prot.html