ajshedivy / Pathlinker-project

PathLinker software implementation.
https://github.com/Murali-group/PathLinker
GNU General Public License v3.0
0 stars 0 forks source link

Discussion: Pathways on demand: automated reconstruction of human signaling networks #1

Open ajshedivy opened 5 years ago

ajshedivy commented 5 years ago

Introduction and background

The problem presented in this paper deals with a primary focus in systems biology; mainly, the identification of a networks reactions that yield a set of pathways or “signals” from receptors to “downstream” transcriptional regulators” (TRs). Databases are kept storing interaction data, yet the protein and interactions within the database may vary. This is what feeds the inspiration behind PathLinker. Namely: “develop a computational approach to automatically reconstruct signaling pathways from a background network of molecular interactions (the interactome).” The development of such a tool (or algorithm) needs two desirable characteristics. First, the method must be able to compute reconstruction that captures a large selection of interactions in the curated signaling pathway. Second, to reflect “signal transduction”. the receptors must connect downstream to TRs.

how it works

Given an interactome, a set of receptors and a set of TRs for a particular curated pathway are identified. PathLinker then reconstructs the pathway ranking proteins and interactions by computing the first occurrence in k shortest paths from any receptor to any TR. This curated pathway acts as a ground truth, which is then evaluated.

Results and evaluation of performance

forthcoming

experimental validation

forthcoming

Concluding discussion: Reconstructing multiple pathways

Two distinct algorithms used for reconstruction of multiple pathways are ones that return a single subnetwork, producing a point on the precision-recall curve, and ones that provide a ranked list of interactions, producing precision-recall curves (PathLinker). Single sub-network algorithms have a goal to create a compact network that relates sources to targets with high precision, but low recall. Algorithms that offered a ranked list (such as PathLinker) had a recall ≥ 0.6. The authors conclude that this is an important parameter feature to have in such the algorithmic process. A parameter such as k caused a smooth expansion of the network while also guaranteeing the connection of receptors to downstream TRs.

Questions

  1. In the design and implementation of algorithms like PathLinker, how are results evaluated for biological relevance and validation? How do you deal with false positives and false negatives? Is a curated pathway always used as a "ground truth"?

  2. How do you measure "close enough"? The authors note that there may be false positive interactions that are "near" curated pathway may represent valid interactions, yet they are not added in the pathway through the curation process. They discuss "high-confidence predictions adjacent to pathway" as further experimental studies.

agitter commented 5 years ago

Very good initial summary and questions. Here are some thoughts on the questions.

In the design and implementation of algorithms like PathLinker, how are results evaluated for biological relevance and validation? How do you deal with false positives and false negatives? Is a curated pathway always used as a "ground truth"?

This is a very difficult problem and different researchers take different approaches. One quantitative approach is to simulate input data by sampling from an existing pathway model. This is similar to what PathLinker did, and does assume the original pathway model is the ground truth. This approach is valuable for understanding how the types of subnetworks returned by a graph algorithm like PathLinker will differ from manually curated pathways.

That approach is not good for learning new biology or for studying processes that are not well characterized with curated pathway models. It also puts too much trust in the curated model. In a recent paper, my group showed that experimental measurements of protein activity in a pathway may not be well represented at all by curated pathway models. Many active proteins are not in the model, and many proteins in the model are not observed to be active.

Other approaches that do not rely as much on pathway models are to do randomization testing or look for independent experimental data. Randomization testing would rerun a method like PathLinker many times with randomly selected input nodes. The idea is that if the random input data produce similar subnetworks as the real input data, the model is not very informative.

Looking for independent experimental data is helpful because it is unbiased and gives strong confirmation that the proteins in the subnetwork may be relevant. However, different types of biological experiments are known to implicate different types of proteins, so one shouldn't expect that if we run PathLinker in input data type A and obtain a subnetwork, the nodes in the subnetwork will be deemed important by input data type B.

How do you measure "close enough"? The authors note that there may be false positive interactions that are "near" curated pathway may represent valid interactions, yet they are not added in the pathway through the curation process. They discuss "high-confidence predictions adjacent to pathway" as further experimental studies.

Given a weighted protein interaction network, one measure of close enough would be based on the edge weights. There are also more advanced measures of distance in a network that account for all weighted pathways between a pair of nodes. That could also give a ranking of close nodes.

Another biological way to think about this is in terms of relationships between curated pathway models. The divisions between one pathway and another are somewhat arbitrary because many of these pathways and processes are related. For example, the KEGG Ras pathway has a white node called PI3K-Akt signaling pathway. That indicates there is an entire additional pathway related to that part of the Ras pathway graph. In biology, these relationships between close pathways is sometimes called cross-talk between pathways.

The authors are correct that the decision about what is included or excluded in a curated pathway model is subjective. Some human curators will include only the most essential interactions. Others are more expansive. This varies by the type of pathway database.