Open davmlaw opened 1 year ago
There are a few codons that are translated differently by mitochondria: https://en.wikipedia.org/wiki/Human_mitochondrial_genetics#Genetic_code_variants so the wrapper would need a bit more code to deal with those.
HGVS uses the translate_cds
method from bioutils. That one already supports alternative translation tables (eg. for selenoproteins). We need another translation table there for mitochondria. That would be similar to this. Then this needs to get enabled with the AltTranscriptData
/ AltSeqBuilder
somehow.
About "Refseq does not have MT transcripts": Would this data be sufficient for us to offer "m_to_p"?
A comment on the initial request: the transcript model for mitochondria is prob a slightly different for mitochondria. I am not sure if we would express mitochondrial variants using c. nomenclature. I think people would refer to them as m. or p. variants. See some test variants that have been provided as part of https://github.com/biocommons/hackathon-2023/issues/4 . We won't get to work on the mito ticket as part of the hackathon, but will follow up on this afterwards.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for 7 days with no activity.
This issue was closed by stalebot. It has been reopened to give more time for community review. See biocommons coding guidelines for stale issue and pull request policies. This resurrection is expected to be a one-time event.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.
RefSeq does not have MT transcripts, but Ensembl does, eg ENST00000361381.2
There is currently no "m" (ie m_to_c) etc methods in AssemblyMapper, so you can't convert these easily.
Q. Should we add m_to_x and x_to_m methods in AssemblyMapper?
It's pretty trivial to get it to work, by switching the 'm' to a 'g' (this is probably how you'd implement it)
I don't think you can reproduce this using UTA as it doesn't have Ensembl transcripts, but here's the cdot code: