Open reece opened 6 years ago
See #267, and particularly this comment:
hgvs-guess-plausible-transcripts works like this:
(3.6) snafu$ ./misc/experimental/hgvs-guess-plausible-transcripts 'HFE2:c.187_188insGAG' 'TNFRSF1A:c.123T>C' 'TNFRSF1A:n.426T>C' FRSF1A:n.426T>C'
HFE2:c.187_188insGAG 5 NM_213653.3:c.187_188insGAG NM_202004.3:c.187_188insGAG NM_145277.4:c.187_188insGAG NM_001316767.1:c.187_188insGAG NM_213652.3:c.187_188insGAG
TNFRSF1A:c.123T>C 1 NM_001065.3:c.123T>C
TNFRSF1A:n.426T>C 1 NR_144351.1:n.426T>C
For each quasi-variant on the command line, the script constructs the variant on all of the transcripts for the named gene. If the variant is considered valid (in the hgvs validator sense), then it's displayed. Columns above are input variant, # of results, list of results (all tab sep).
Hi, I've had to implement this, and after putting it in front of medical scientists, I found the "generate everything" was slow and not that helpful to them. What they appear to want is the canonical transcript, which nowadays is the MANE transcript.
The RefSeq and Ensembl GTFs have tags on them, eg MANE_select for GRCh38 or Refseq select for 37
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for 7 days with no activity.
This issue was closed by stalebot. It has been reopened to give more time for community review. See biocommons coding guidelines for stale issue and pull request policies. This resurrection is expected to be a one-time event.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.
Unfortunately, some authors generate bogus HGVS expressions that use gene names rather than reference sequences. This issue should provide functionality to generate plausible expressions.
For example for TNFSF1A has 4 transcripts at the site of rs104895271. So, for a gene-based variant like
TNFRSF1A:c.123T>C
returnNM_001065.3:c.123T>C
, and forTNFRSF1A:n.426T>C
returnNR_144351.1:n.426T>C
. In general, there might be zero or more plausible variants for a given input.See code in
misc/experimental/hgvs-guess-plausible-transcripts
.