biocommons / hackathon-2023

Hackathon 2023 projects and planning.
0 stars 0 forks source link

Support Mitochondrial HGVS #4

Closed korikuzma closed 1 year ago

korikuzma commented 1 year ago

Submitter Name

Andreas Prlic (@andreasprlic)

Submitter Affiliation

Invitae

Requested By

Invitae

Additional Submitter Details

No response

Lead(s)

@andreasprlic

biocommons Repo

hgvs

Project Details

The hgvs library uses the translate_cds method from bioutils. That one already supports alternative translation tables (eg. for selenoproteins). We need another translation table there for mitochondria. That would be similar to https://github.com/biocommons/bioutils/issues/36. Then this needs to get enabled with the AltTranscriptData / AltSeqBuilder somehow. See a related ticket here.

Refseq has this data for MT genes I think this should be sufficient to offer "m_to_p". This data needs to get loaded into seqrepo / UTA in a way so we can conveniently access it and it looks similar to the rest of data used by hgvs.

Skill Level

Advanced

Required Skills

Python, Mitochondrial HGVS nomenclature

andreasprlic commented 1 year ago

The hgvs library uses the translate_cds method from bioutils. That one already supports alternative translation tables (eg. for selenoproteins). We need another translation table there for mitochondria. That would be similar to https://github.com/biocommons/bioutils/issues/36. Then this needs to get enabled with the AltTranscriptData / AltSeqBuilder somehow. See a related ticket here.

Refseq has this data for MT genes I think this should be sufficient to offer "m_to_p". This data needs to get loaded into seqrepo / UTA in a way so we can conveniently access it and it looks similar to the rest of data used by hgvs.

korikuzma commented 1 year ago

@andreasprlic Thanks! I will just copy this into the Project Details section

reece commented 1 year ago

@andreasprlic: I don't have a good handle on exactly what it will take to implement this, but I think we're in for at least a new version of UTA, and we both know what that's like.

Would you please do the following?

veenarajaraman commented 1 year ago

chrMT_test_variants.csv Here are some test variants from ClinVar. Coding regions: NC_012920.1_coding_regions.csv

veenarajaraman commented 1 year ago

we also need a PR into https://github.com/biocommons/bioutils to add in the vertebrate mitochondrial translation table

diff --git a/src/bioutils/sequences.py b/src/bioutils/sequences.py
index 1a2ce75..c67f966 100644
--- a/src/bioutils/sequences.py
+++ b/src/bioutils/sequences.py
@@ -221,6 +221,18 @@ dna_to_aa1_lut = {  # NCBI standard translation table
 dna_to_aa1_sec = dna_to_aa1_lut.copy()
 dna_to_aa1_sec["TGA"] = "U"

+# Vertebrate micochondrial translation table
+# https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?chapter=tgencodes#SG2
+
+dna_to_aa1_vmito = dna_to_aa1_lut.copy()
+dna_to_aa1_vmito["AGA"] = "*"
+dna_to_aa1_vmito["AGG"] = "*"
+dna_to_aa1_vmito["ATA"] = "M"
+dna_to_aa1_vmito["TGA"] = "W"
+
+
+
+
 complement_transtable = bytes.maketrans(b"ACGT", b"TGCA")

@@ -506,6 +518,7 @@ class TranslationTable(StrEnum):

     standard = "standard"
     selenocysteine = "sec"
+    vertebrate_mitochondrial = 'vmito'

 def translate_cds(seq, full_codons=True, ter_symbol="*", translation_table=TranslationTable.standard):
@@ -596,6 +609,8 @@ def translate_cds(seq, full_codons=True, ter_symbol="*", translation_table=Trans
         trans_table = dna_to_aa1_lut
     elif translation_table == TranslationTable.selenocysteine:
         trans_table = dna_to_aa1_sec
+    elif translation_table == TranslationTable.vertebrate_mitochondrial:
+        trans_table = dna_to_aa1_vmito
     else:
         raise ValueError("Unsupported translation table {}".format(translation_table))
     seq = replace_u_to_t(seq)
korikuzma commented 1 year ago

This will not be worked on at the hackathon. @andreasprlic is going to merge some comments before closing.

andreasprlic commented 1 year ago

We won't get to this issue as part of the hackthon this weekend, but we will continue on this topic afterwards as part of https://github.com/biocommons/hgvs/issues/663