Closed korikuzma closed 1 year ago
The hgvs library uses the translate_cds method from bioutils. That one already supports alternative translation tables (eg. for selenoproteins). We need another translation table there for mitochondria. That would be similar to https://github.com/biocommons/bioutils/issues/36. Then this needs to get enabled with the AltTranscriptData / AltSeqBuilder somehow. See a related ticket here.
Refseq has this data for MT genes I think this should be sufficient to offer "m_to_p". This data needs to get loaded into seqrepo / UTA in a way so we can conveniently access it and it looks similar to the rest of data used by hgvs.
@andreasprlic Thanks! I will just copy this into the Project Details section
@andreasprlic: I don't have a good handle on exactly what it will take to implement this, but I think we're in for at least a new version of UTA, and we both know what that's like.
Would you please do the following?
chrMT_test_variants.csv Here are some test variants from ClinVar. Coding regions: NC_012920.1_coding_regions.csv
we also need a PR into https://github.com/biocommons/bioutils to add in the vertebrate mitochondrial translation table
diff --git a/src/bioutils/sequences.py b/src/bioutils/sequences.py
index 1a2ce75..c67f966 100644
--- a/src/bioutils/sequences.py
+++ b/src/bioutils/sequences.py
@@ -221,6 +221,18 @@ dna_to_aa1_lut = { # NCBI standard translation table
dna_to_aa1_sec = dna_to_aa1_lut.copy()
dna_to_aa1_sec["TGA"] = "U"
+# Vertebrate micochondrial translation table
+# https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?chapter=tgencodes#SG2
+
+dna_to_aa1_vmito = dna_to_aa1_lut.copy()
+dna_to_aa1_vmito["AGA"] = "*"
+dna_to_aa1_vmito["AGG"] = "*"
+dna_to_aa1_vmito["ATA"] = "M"
+dna_to_aa1_vmito["TGA"] = "W"
+
+
+
+
complement_transtable = bytes.maketrans(b"ACGT", b"TGCA")
@@ -506,6 +518,7 @@ class TranslationTable(StrEnum):
standard = "standard"
selenocysteine = "sec"
+ vertebrate_mitochondrial = 'vmito'
def translate_cds(seq, full_codons=True, ter_symbol="*", translation_table=TranslationTable.standard):
@@ -596,6 +609,8 @@ def translate_cds(seq, full_codons=True, ter_symbol="*", translation_table=Trans
trans_table = dna_to_aa1_lut
elif translation_table == TranslationTable.selenocysteine:
trans_table = dna_to_aa1_sec
+ elif translation_table == TranslationTable.vertebrate_mitochondrial:
+ trans_table = dna_to_aa1_vmito
else:
raise ValueError("Unsupported translation table {}".format(translation_table))
seq = replace_u_to_t(seq)
This will not be worked on at the hackathon. @andreasprlic is going to merge some comments before closing.
We won't get to this issue as part of the hackthon this weekend, but we will continue on this topic afterwards as part of https://github.com/biocommons/hgvs/issues/663
Submitter Name
Andreas Prlic (@andreasprlic)
Submitter Affiliation
Invitae
Requested By
Invitae
Additional Submitter Details
No response
Lead(s)
@andreasprlic
biocommons Repo
hgvs
Project Details
The hgvs library uses the translate_cds method from bioutils. That one already supports alternative translation tables (eg. for selenoproteins). We need another translation table there for mitochondria. That would be similar to https://github.com/biocommons/bioutils/issues/36. Then this needs to get enabled with the AltTranscriptData / AltSeqBuilder somehow. See a related ticket here.
Refseq has this data for MT genes I think this should be sufficient to offer "m_to_p". This data needs to get loaded into seqrepo / UTA in a way so we can conveniently access it and it looks similar to the rest of data used by hgvs.
Skill Level
Advanced
Required Skills
Python, Mitochondrial HGVS nomenclature