biocommons / hgvs

Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`
https://hgvs.readthedocs.io/
Apache License 2.0
241 stars 94 forks source link

Provide utility to generate transcript variants from HGVS-ish expressions with gene names #517

Open reece opened 6 years ago

reece commented 6 years ago

Unfortunately, some authors generate bogus HGVS expressions that use gene names rather than reference sequences. This issue should provide functionality to generate plausible expressions.

For example for TNFSF1A has 4 transcripts at the site of rs104895271. So, for a gene-based variant like TNFRSF1A:c.123T>C return NM_001065.3:c.123T>C, and for TNFRSF1A:n.426T>C return NR_144351.1:n.426T>C. In general, there might be zero or more plausible variants for a given input.

See code in misc/experimental/hgvs-guess-plausible-transcripts.

reece commented 5 years ago

See #267, and particularly this comment:


hgvs-guess-plausible-transcripts works like this:

(3.6) snafu$ ./misc/experimental/hgvs-guess-plausible-transcripts 'HFE2:c.187_188insGAG' 'TNFRSF1A:c.123T>C' 'TNFRSF1A:n.426T>C' FRSF1A:n.426T>C' 
HFE2:c.187_188insGAG    5   NM_213653.3:c.187_188insGAG NM_202004.3:c.187_188insGAG NM_145277.4:c.187_188insGAG NM_001316767.1:c.187_188insGAG  NM_213652.3:c.187_188insGAG
TNFRSF1A:c.123T>C   1   NM_001065.3:c.123T>C
TNFRSF1A:n.426T>C   1   NR_144351.1:n.426T>C

For each quasi-variant on the command line, the script constructs the variant on all of the transcripts for the named gene. If the variant is considered valid (in the hgvs validator sense), then it's displayed. Columns above are input variant, # of results, list of results (all tab sep).

davmlaw commented 1 year ago

Hi, I've had to implement this, and after putting it in front of medical scientists, I found the "generate everything" was slow and not that helpful to them. What they appear to want is the canonical transcript, which nowadays is the MANE transcript.

The RefSeq and Ensembl GTFs have tags on them, eg MANE_select for GRCh38 or Refseq select for 37

github-actions[bot] commented 10 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 9 months ago

This issue was closed because it has been stalled for 7 days with no activity.

reece commented 8 months ago

This issue was closed by stalebot. It has been reopened to give more time for community review. See biocommons coding guidelines for stale issue and pull request policies. This resurrection is expected to be a one-time event.

github-actions[bot] commented 5 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.