cov-lineages / constellations

Other
43 stars 17 forks source link

Question: Nucleotide mutations behind amino acid mutations in .json definitions #63

Closed mikelchtermans closed 2 years ago

mikelchtermans commented 2 years ago

Hello,

For every constellation, a definition file is present in definitions/, Mutations are written in different ways: nucleotide and amino-acid. However, for amino-acid mutations, multiple nucleotide mutations in a codon can result in the same resulting amino-acid. Do you have an underlying list of nucleotide mutations behind these amino-acid mutations? If not, what is the rationale behind this seeming "loss of information"? Does only the resulting functional gene matter for virus behaviour and is it therefore not of importance which nucleotide is mutated?

Thank you in advance, and kind regards, Michaël

rmcolq commented 2 years ago

Good question, here is our rationale: 1. These files are designed to be useful when human read and we thought that the amnio acid (without having to go and translate manually) was more useful for interpretation, 2. There were some examples e.g. ?+E484K where we where we explicitly wanted to capture different nucleotide sequences which yielded the same amnio acid change and 3. We wanted to be more robust to slight variations in alignment 4. These definitions were written to be used as a whole by scorpio. Certainly until now we did not consider the probability of accidentally having a different nucleotide giving same amnio acid and this being the difference between a true negative and a false positive call overall were low.

-------- Original message -------- From: mikelchtermans @.> Date: 28/04/2022 09:15 (GMT+00:00) To: cov-lineages/constellations @.> Cc: Subscribed @.***> Subject: [cov-lineages/constellations] Question: Nucleotide mutations behind amino acid mutations in .json definitions (Issue #63)

This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe.

Hello,

For every constellation, a definition file is present in definitions/, however mutations are written in different ways, nucleotide and amino-acid. However, for amino-acid mutations, multiple nucleotide mutations in a codon can result in the same resulting amino-acid. Do you have an underlying list of nucleotide mutations behind these amino-acid mutations? If not, what is the rationale behind this seeming "loss of information"? Does only the resulting functional gene matter for virus behaviour and is it therefore not of importance which nucleotide is mutated?

Thank you in advance, and kind regards, Michaël

— Reply to this email directly, view it on GitHubhttps://github.com/cov-lineages/constellations/issues/63, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACLIWO22O2S4XANZJQ533GDVHJCINANCNFSM5URRZZBQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

mikelchtermans commented 2 years ago

Thanks a lot for the clarifying answer!