berthubert / bnt162b2

Markdown version of Reverse Engineering the source code of the BioNTech/Pfizer SARS-CoV-2 Vaccine
169 stars 57 forks source link

Algo attempt #3

Open Sjors opened 3 years ago

Sjors commented 3 years ago

~62.4%~ 62.7%

I'll see if I can do better.

Sjors commented 3 years ago

There's clearly more going on than just optimising each codon independently.

Let's look at Alanine, which occurs 79 times. My current algorithm always uses GCC, which works out most of the time.

The only time my algorithm doesn't use GCC is when the virus already used the equally optimal GCG. The vaccine does not make that exception, see the last two rows. Overall this heuristic performs (slightly) better, perhaps a coincidence.

Schermafbeelding 2020-12-31 om 21 15 38

But more interesting are the other rows where the vaccine sometimes doesn't make a substitutions, and other times makes a different substitution than one might expect. Sometimes it swaps a T for an A, other times an A for a T.

Perhaps (efficient) RNA folding also plays a role?

https://www.nrc.nl/nieuws/2020/11/27/het-virus-tref-je-het-hardst-in-zijn-rna-a4021689