Start here: corona.py
This project applies techniques from reverse engineering to understand the SARS-CoV-2 virus. The goal here is simply to build an understanding of the virus from first principles.
Biological systems are fundamentally information processing systems. While not a perfect analogy, software provides a useful framework for thinking about biology. The table below provides a rough outline of this analogy.
:microscope: Biology | :computer: Software | Notes |
---|---|---|
nucleotide | byte | |
genome | bytecode | |
translation | disassembly | 3 byte wide instruction set with arbitrary "reading frames" |
protein | function | a polyprotein is a function with multiple pieces |
protein secondary structure | basic blocks | 80% accuracy in prediction |
protein tertiary structure | This seems like the hard one to predict: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0205819 | |
quaternary structure | compiled function with inlining | https://en.wikipedia.org/wiki/Protein%E2%80%93protein_interaction_prediction |
gene | library | bacteria are statically linked, viruses are dynamically linked |
transcription | loading | |
protein structure prediction | library identification | |
genome analysis | static analysis | |
molecular dynamics simulations of protein folding | dynamic analysis | Simulation doesn't seem to work yet. Constrained by tooling and compute. |
no equivalent | execution | We are reverse engineering a CAD format. Runs more like FPGA code, all at once. No serial execution. (What are the FPGA reverse engineering tools?) |
GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA and RNA sequences. The SARS-CoV-2 sequences available in GenBank have been downloaded in download_sequences.py
.
lib.py
contains a function translate
that converts an RNA sequence to a chain of amino acids. This function is used in corona.py
.
The translate
function is used in corona.py
to identify and annotate functions for all proteins encoded by the genome.
The OpenMM toolkit is used for molecular simulation of protein folding in fold.py
.
:warning: Disclaimer: The information in this repository is for informational purposes only. It is not medical advice.