biocommons / hgvs

Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`
https://hgvs.readthedocs.io/
Apache License 2.0
233 stars 94 forks source link

How to properly handle selenoproteins #710

Open holtgrewe opened 7 months ago

holtgrewe commented 7 months ago

One example is gene SELENON with transcript NM_206926.2. Here, an alternative translation table must:

Note=UGA stop codon recoded as selenocysteine

Example line from the NCBI GFF3 file:

NC_000001.10    BestRefSeq      CDS     26126722        26126904        .       +       0       ID=cds-NP_996809.1;Parent=rna-NM_206926.2;Dbxref=CCDS:CCDS41283.1,GeneID:57190,Genbank:NP_996809.1,HGNC:HGNC:15999,MIM:606210;Name=NP_996809.1;Note=UGA stop codon recoded as selenocysteine%3B isoform 1 is encoded by transcript variant 1;gbkey=CDS;gene=SELENON;product=selenoprotein N isoform 1;protein_id=NP_996809.1;transl_except=(pos:26139280..26139282%2Caa:Sec)

biocommons.bioutils already contains the proper table. Is this implemented in UTA/hgvs anywhere? There are some matches of "seleno" in the UTA repository limited to misc/EnsemblUTA/*.

github-actions[bot] commented 4 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

holtgrewe commented 4 months ago

Ping

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.