Camoco is a python package for building gene co-expression networks and analyzing overlap interactions among genes and their mutations.
Motivation and Background
Camoco abbreviates the: co-analysis of molecular components. These components are the molecular building blocks of life. An organisms molecular makeup is largely controlled by its genetics, which can be described at a very small scale using the Central Dogma of molecular biology. Briefly, this just states that functional units of cells (proteins and metabolites) are determined with extreme fidelity from DNA -- the unit of inheritance.
Despite cracking this genetic code in the 1950s, scientists are often limited in their studies by what they are able to reliably measure. Like Heisenberg's uncertainty principle in Physics, the act of probing cells almost always destroys them (or induces severe stress response). To boot, the scale at which genetic information is stored redundantly is unfathomable -- each cell in your body (of which you have ~40 trillion) contains an entire copy of your genome (containing ~3 billion base pairs). Despite being extravagantly complex in its physiological mechanism, there appears to be a deterministic link between a largely static genome and the traits of an organism.
Since its inception, the field of genetics has gotten very good at quantifying the link between organisms' genomes and various traits of interest. Ignoring physiological mechanism, genome wide association studies look for statistical links between genetic mutations and physical traits. These studies often survey millions of mutations and assess their contribution independently. For many traits, the genetics appears to be simple. The traits (often binary in nature) follow "If this then that" rules. Other traits (often on a spectrum) appear to be more complex. Limitations in statistical design leaves researchers with dozens or hundreds of mutations that somehow contribute to a trait.
In the past decade, molecular techniques have emerged that enable the quantification of gene products, either through an intermediate such transcription, or through direct quantification of protein or metabolite abundances. Where there are potentially millions of mutations that could affect a trait, there are only tens of thousands (hah!) genes which are protein coding. Paired with modern computing power, and scale-able architecture, its possible to characterize how interactions among gene products influence traits. Furthermore, since genetic mutations work in close proximity to their affected gene product, genetic perturbations can be co-analyzed with abundances of gene products to characterize traits of interest.
Camoco aims to encapsulate many of these molecular components and the provide an easy to query interface that enables experts to co-analyze data.
Using Camoco, biomedical researchers can:
Easily turn gene product abundance data (RNA-Seq, Metabolite/Protein abundance) into gene networks
Map mutations (SNPs, QTL) to annotated gene-networks using simple, systematic rules
Statistically identify when genes near trait associated loci co-occur indicating high level pathways and gene cascades that are key-influences in the trait they are studying
Enjoy a rich and expressive syntax in which to interpret and inter-relate queries
Have peace of mind that the black box backing their results is in fact an open source and research drive project supported by a community of people who believe in high code standards and development
This last point is supported by this document, which outlines our RoadMap and what we hope to accomplish by developing Camoco as a computational framework. Milestones can be tracked either at a high level here in this document or through our Issue Tracker and Project Map which will contain discussion and dirty details.
Short Term Goals: (v0.5.0) - What we are working on right now
Update README and Installation instructions
Address open Issues and Bugs
Increase documentation
Establish plan for code segmentation and migration to LinkageIO
Medium Term Goals: (v1.0.0) - What we are working on next
Fully documented
Extensively tested (CI)
Contain code and analysis examples
Have a written and submitted scientific manuscript demonstrating academic utility
Code demonstration at the Plant and Animal Genetics Conference
Long Term Goals: (v2.0.0+) - Our future vision for Camoco
Road Map
Camoco is a python package for building gene co-expression networks and analyzing overlap interactions among genes and their mutations.
Motivation and Background
Camoco abbreviates the: co-analysis of molecular components. These components are the molecular building blocks of life. An organisms molecular makeup is largely controlled by its genetics, which can be described at a very small scale using the Central Dogma of molecular biology. Briefly, this just states that functional units of cells (proteins and metabolites) are determined with extreme fidelity from DNA -- the unit of inheritance.
Despite cracking this genetic code in the 1950s, scientists are often limited in their studies by what they are able to reliably measure. Like Heisenberg's uncertainty principle in Physics, the act of probing cells almost always destroys them (or induces severe stress response). To boot, the scale at which genetic information is stored redundantly is unfathomable -- each cell in your body (of which you have ~40 trillion) contains an entire copy of your genome (containing ~3 billion base pairs). Despite being extravagantly complex in its physiological mechanism, there appears to be a deterministic link between a largely static genome and the traits of an organism.
Since its inception, the field of genetics has gotten very good at quantifying the link between organisms' genomes and various traits of interest. Ignoring physiological mechanism, genome wide association studies look for statistical links between genetic mutations and physical traits. These studies often survey millions of mutations and assess their contribution independently. For many traits, the genetics appears to be simple. The traits (often binary in nature) follow "If this then that" rules. Other traits (often on a spectrum) appear to be more complex. Limitations in statistical design leaves researchers with dozens or hundreds of mutations that somehow contribute to a trait.
In the past decade, molecular techniques have emerged that enable the quantification of gene products, either through an intermediate such transcription, or through direct quantification of protein or metabolite abundances. Where there are potentially millions of mutations that could affect a trait, there are only tens of thousands (hah!) genes which are protein coding. Paired with modern computing power, and scale-able architecture, its possible to characterize how interactions among gene products influence traits. Furthermore, since genetic mutations work in close proximity to their affected gene product, genetic perturbations can be co-analyzed with abundances of gene products to characterize traits of interest.
Camoco aims to encapsulate many of these molecular components and the provide an easy to query interface that enables experts to co-analyze data.
Using Camoco, biomedical researchers can:
This last point is supported by this document, which outlines our RoadMap and what we hope to accomplish by developing Camoco as a computational framework. Milestones can be tracked either at a high level here in this document or through our Issue Tracker and Project Map which will contain discussion and dirty details.
Short Term Goals: (v0.5.0) - What we are working on right now
Medium Term Goals: (v1.0.0) - What we are working on next
Long Term Goals: (v2.0.0+) - Our future vision for Camoco