discussion: Mathematical and computational modeling of signaling networks and pathways

ajshedivy commented 5 years ago

Here I will be posting discussion posts that promote computational methods for traversing, reconstructing, and identifying signaling pathways and interaction networks within a biological setting. This environment will be used to help deepen my understanding of the fundamental principles, or "theoretical" side of the computational methods. It is my goal to soon have a page up to discuss the results and performance of such algorithms.

Please feel free to send me any papers or other resources that talk about biological signaling or any other topic that we have discussed.

ajshedivy commented 5 years ago

Summary: Integration of Proteomic, Transcriptional, and Interactome Data Reveals Hidden Signaling Components

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2889494/

Abstract

Cellular signaling and regulatory networks underlie fundamental biological processes such as growth, differentiation, and response to a specific environment. A problem presented by using data from transcriptional and genetic assays from experimentation is that the identified hits may lie outside the expected pathways. These unexpected pathways are what give interesting insights into new biological processes. These results are often the hardest to interpret for biological significance. "This paper presents a technique, based on the Steiner tree problem, that uses previously reported protein-protein and protein-DNA interactions to determine how these hits are organized into functionally coherent pathways, revealing many components of the cellular response that are not readily apparent in the original data. "

Introduction

Molecular changes that happen in cells are identified through the presence of stimuli. The results of such experimentation give us a systematic view of signaling and regulatory changes that can uncover unrecognized components. For example, genetic screening identifies sets of genes whose expression changes lead to an altered phenotype. The products of these genes are likely to be involved in the regulatory pathways. The problem comes with the data; datasets are often incomplete and incompatible with each other showing how there are still gaps in our knowledge of regulatory networks.

Goals

the number of unexpected components within these cellular responses presents a challenge to computational scientists and biologists. Computational methods have the power to use this data to find new insights and context for cellular processes. There is a problem that comes to any computational approach; that is that the approach must overcome the fact that not all components in the regulatory networks can be exposed in one experiment due to systematic biases. For instance, compensatory mechanisms can mask the consequences of genetic manipulations. Such stimulus may cause "conformational changes" that would not be detected by mass spectrometry. This produces a series of "hidden" components that are not detected by experimentation. Hidden components (or nodes in a network) are critical for interpreting the functional significance of the data.

The goal is to "construct a network of protein-protein and protein-DNA interactions, including hidden components, that explains the functional context of genes and proteins detected in such assays. " This approach takes advantage of a large number of protein-protein and protein-DNA interactions present in an interactome. This method is attractive because it integrates molecular pathways that are already known, but also expands beyond these pathways.

Methods

there are three ways that Huang and Fraenkel briefly outline that discover meaningful regulatory networks that link identified genes.

Enriched Region of differential expression: This method combines information from phenotype data and a PPI network and searches for enriched regions of expression.
- con: this approach does not consider the variability of interaction data from multiple datasets, which can lead to connecting false positives.
Flow-based approach: This method starts with an interactome in which the goal is to find connections linking a set of transcribed genes to a 2nd set of genetic hits that represent the upstream signal.
- con: this approach can miss relevant connections in proteomic data because of a lack of "direct link" to transcriptional changes.
Constrained optimization(proposal of Huang and Fraenkel): This view takes proteins and genes that have been detected in experiments and uses this data to drive the selection of relevant pathways from the interactome. In order to avoid "forcing a solution", the goal is to connect the data as a constraint that can be satisfied via optimation. Huang and Fraenkel claim that this problem can be modeled using a prize-collecting variant of the Steiner tree problem.

The Steiner tree problem

The Steiner tree problem uses a weighted graph with a set of "terminal" nodes in the graph. The algorithm's job is to connect these terminal nodes either directly or indirectly through the edges of the graph. The prize-collecting variant is used to relax the constraints such that it is not required that all termini are included in the solution. The algorithm instead balances two costs: (1) it pays a penalty for leaving a terminal out of the network; (2) it pays a price for using an edge to include a terminal in the network. The size of the network is also controlled by a parameter that looks at the weights of the penalties of excluding terminal nodes relative to the cost of including edges. The cost is determined by the reliability and importance of such edge or terminal node in the experimental data. The solution: a minimum-weighted subtree that connects a subset of the termini to each other through edges of the interactome graph (additional nodes may also be included).

Discussion

The prize-collecting Steiner tree problem (PCST) has an elegance in its problem formulation. It takes an intuitive approach to what we are trying to accomplish when we construct these networks of coherent pathways. It ultimately turns our challenge into an optimization problem. What are further advantages and disadvantages of using a "constrained optimization" view of the overall objective?

ajshedivy commented 5 years ago

re-open issue

agitter commented 5 years ago

The biology jargon "conformational changes" means that a protein may change its shape because of a stimulus but not change its abundance. There is not more copies of the protein made by the cell, rather the existing copies reconfigure themselves.

What are further advantages and disadvantages of using a "constrained optimization" view of the overall objective?

I'd argue that many related pathway finding approaches use a "constrained optimization" approach. Here, it takes the form of the Steiner Tree objective. But even in the related flow approach, it is still constrained optimization where the objective is to push flow from source nodes to target nodes and the constraints limit how much flow can push through each individual edge. In PathLinker, the objective is to connect sources and targets and the constraints are to do so with short weighted paths.

One other general comment about summarizing research papers is that I find it helpful to use quotation formatting when copying notes directly from the paper. This can be a life saver when looking at old notes years later because you won't remember what is your summary and what is original text from the paper. I had a collaborator accidentally copy text from a paper into their manuscript when going back to old notes.

ajshedivy / Pathlinker-project