LinkageIO / Camoco

Camoco is a fully-fledged software package for building co-expression networks and analyzing the overlap interactions among genes.
MIT License
41 stars 32 forks source link

Project Roadmap #63

Closed schae234 closed 5 years ago

schae234 commented 6 years ago

Road Map

Camoco is a python package for building gene co-expression networks and analyzing overlap interactions among genes and their mutations.

Motivation and Background

Camoco abbreviates the: co-analysis of molecular components. These components are the molecular building blocks of life. An organisms molecular makeup is largely controlled by its genetics, which can be described at a very small scale using the Central Dogma of molecular biology. Briefly, this just states that functional units of cells (proteins and metabolites) are determined with extreme fidelity from DNA -- the unit of inheritance.

Despite cracking this genetic code in the 1950s, scientists are often limited in their studies by what they are able to reliably measure. Like Heisenberg's uncertainty principle in Physics, the act of probing cells almost always destroys them (or induces severe stress response). To boot, the scale at which genetic information is stored redundantly is unfathomable -- each cell in your body (of which you have ~40 trillion) contains an entire copy of your genome (containing ~3 billion base pairs). Despite being extravagantly complex in its physiological mechanism, there appears to be a deterministic link between a largely static genome and the traits of an organism.

Since its inception, the field of genetics has gotten very good at quantifying the link between organisms' genomes and various traits of interest. Ignoring physiological mechanism, genome wide association studies look for statistical links between genetic mutations and physical traits. These studies often survey millions of mutations and assess their contribution independently. For many traits, the genetics appears to be simple. The traits (often binary in nature) follow "If this then that" rules. Other traits (often on a spectrum) appear to be more complex. Limitations in statistical design leaves researchers with dozens or hundreds of mutations that somehow contribute to a trait.

In the past decade, molecular techniques have emerged that enable the quantification of gene products, either through an intermediate such transcription, or through direct quantification of protein or metabolite abundances. Where there are potentially millions of mutations that could affect a trait, there are only tens of thousands (hah!) genes which are protein coding. Paired with modern computing power, and scale-able architecture, its possible to characterize how interactions among gene products influence traits. Furthermore, since genetic mutations work in close proximity to their affected gene product, genetic perturbations can be co-analyzed with abundances of gene products to characterize traits of interest.

Camoco aims to encapsulate many of these molecular components and the provide an easy to query interface that enables experts to co-analyze data.

Using Camoco, biomedical researchers can:

This last point is supported by this document, which outlines our RoadMap and what we hope to accomplish by developing Camoco as a computational framework. Milestones can be tracked either at a high level here in this document or through our Issue Tracker and Project Map which will contain discussion and dirty details.

Short Term Goals: (v0.5.0) - What we are working on right now

Medium Term Goals: (v1.0.0) - What we are working on next

Long Term Goals: (v2.0.0+) - Our future vision for Camoco

schae234 commented 6 years ago

I am going to remove this from a specific milestone.

schae234 commented 5 years ago

Closing this as it is outdated.