Standardized Nodes Output File?

Reed-CompBio / spras

Signaling Pathway Reconstruction Analysis Streamliner (SPRAS)

MIT License

11 stars 20 forks source link

As @Lyce24 implements a random walk with restarts algorithm, it occurs to us that there may be algorithms that contain useful information for the nodes (in RWR's case, the node visitation probability). Right now, SPRAS standardizes a pathway as an edge list file. Some options we could discuss:

Essentially ignore node information in the SPRAS outputs. Intermediate files may contain this information for different algorithms.
Add a standardized node file, which is written if there is useful node information. Would often require developers to do a bit of extra work when there's no useful node information.
Include node information as extra columns in the pathway.txt standardized output file. The information would be redundant - the node info would be written for EVERY edge that the node is incident on - but it would keep the current file format for pathways.

The standardized node file seems like the right choice. We could create a helpful util function for the algorithms that don't generate node information that takes in the pathway file and automatically writes a default node output file so that developers don't have to think about it too much. Like write_default_node_scores("pathway.txt", "nodes.txt").

Before we proceed, can we outline what other methods (existing or on the SPRAS roadmap) also generate node information so we can think through how it can be used? And how do we envision using node information in downstream tasks? For instance, it would be great to load into our Cytoscape visualizations. Would we load it as a generic "score", or do we need to also track a label saying this score is a node visitation probability.

Reed-CompBio / spras

Standardized Nodes Output File? #88