biocommons / hgvs

Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`
https://hgvs.readthedocs.io/
Apache License 2.0
241 stars 94 forks source link

review causes of failed clinvar tests #380

Closed reece closed 7 months ago

reece commented 8 years ago

Originally reported by: Reece Hart (Bitbucket: reece, GitHub: reece)


361 created new tests using ClinVar. These tests cover GRCh37 and GRCh38.

During the select of tests, it was discovered that ~10% the expected outputs did not match those of hgvs. Examples are:

Some of the clinvar variants are clearly wrong (TerTer, for example). However, others reflect (at least) shortcomings of the hgvs package and may be bugs.

The file tests/data/clinvar.gz contains tests commented out with the reasons for each.

The goal for this issue is to triage these errors and, as necessary, create new issues to address problems.


deannachurch commented 4 years ago

Update to this. I'm parsing a clinvar file now and got this error:

HGVSParseError: NM_007194.4(CHEK2):c.1135_1136TC[2]: char 32: expected the character '='

I checked the recommendations and the c.1135_1136TC[2] looks ok per https://varnomen.hgvs.org/recommendations/RNA/variant/repeated/

deannachurch commented 4 years ago

Another ClinVar failure. Parsing this variant: NM_007194.4(CHEK2):c.3G>T

To get the protein variant, hgvs returns this: NP_009125.1:p.Met1?

But, it should be: NP_009125.1:p.Met1Ile (from ClinVar)

akeeeshi commented 4 years ago

Hi @deannachurch

Per past issue discussion that have been had (#566) here and on the VariantValidator project (#86) any mutations in the Start Codon should be marked as M1? as the amino acid change fundamentally disrupts the translation signal. Without a valid Start Codon the effect is unknown.

This question was addressed by the HGVS society in a recent question (https://www.facebook.com/HGVSmutnomen/posts/2430762803629529) as well.

deannachurch commented 4 years ago

Thanks for the update. It would be useful then if the code returned something more useful. When I try to pull information on the protein I get an exception rather than useful information: pvar.posedit.pos.start.base

Is that possible?

reece commented 4 years ago

Right now, it's not possible. But, it's highly desired.

https://github.com/biocommons/hgvs/issues/333 has an explanation of why things are the way they are.

We can definitely do better here, but it's significant work and has been lower priority than other work.

-Reece

On Fri, Jan 3, 2020 at 10:26 AM Deanna Church notifications@github.com wrote:

Thanks for the update. It would be useful then if the code returned something more useful. When I try to pull information on the protein I get an exception rather than useful information: pvar.posedit.pos.start.base

Is that possible?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biocommons/hgvs/issues/380?email_source=notifications&email_token=AAA2XDNXBFSS4XTR57EI64TQ357MNA5CNFSM4KCH53GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIBYMFA#issuecomment-570656276, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA2XDLKY3N7PLH54OXCIUDQ357MNANCNFSM4KCH53GA .

deannachurch commented 4 years ago

I get it- and I can work around now. Thanks for the hard, unpaid labor here!

github-actions[bot] commented 7 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 7 months ago

This issue was closed because it has been stalled for 7 days with no activity.