Becksteinlab / kda

Python package used for the analysis of biochemical kinetic diagrams.
GNU General Public License v3.0
3 stars 1 forks source link

ENH: Optimize state probability expression parsing #87

Closed nawtrey closed 2 months ago

nawtrey commented 2 months ago

Description

I found a rather obvious performance optimization regarding expression parsing with sympy. At a high level, we were parsing the same terms in state probability expressions multiple times. This was due to expressions being normalized before parsing, so sympy was parsing each state multiplicity + the normalization expression (the sum of all multiplicity expressions) for each state.

For a diagram with N states, there are N*P terms, where P is the number of partial diagrams. This means we have to parse N*N*P terms to generate all probability expressions. With the current algorithm, we are parsing N*(N*P + N*N*P) terms, which means we are parsing N+1x the number of terms we need to.

To quantify the real world performance difference, I wrote a short script which times the generation of symbolic state probability expressions:

Testing script ```python import os import time import pickle from kda import calculations def load_pickled_graph(path): """ Loads a pickled `NetworkX.MultiDiGraph` object """ with open(path, 'rb') as f: G = pickle.load(f) print(f"Loaded {os.path.basename(path)} -- {G}") return G def main(): graph_paths = [ os.path.abspath("./graph_3_1.pk"), os.path.abspath("./graph_4_1.pk"), os.path.abspath("./graph_5_1.pk"), os.path.abspath("./graph_6_9.pk"), os.path.abspath("./graph_7_17.pk"), ] graphs = [] times = [] for path in graph_paths: G = load_pickled_graph(path=path) start = time.time() prob_exprs = calculations.calc_state_probs(G=G, key="name", output_strings=True) runtime = time.time() - start graphs.append(os.path.basename(path)) times.append(runtime) print(f"\nRuntime to generate all state probability expressions") print("-" * 40) for g, t in zip(graphs, times): print(f"{g}: {t:.3f} s") if __name__ == "__main__": main() ```

Comparing the runtimes on master vs. optimize_prob_expr_parsing:

Graph             master      optimize_prob_expr_parsing
--------------------------------------------------------
graph_3_1         0.094 s           0.062 s
graph_4_1         0.047 s           0.047 s
graph_5_1         0.375 s           0.145 s
graph_6_9         8.274 s           0.837 s
graph_7_17       18.990 s           1.743 s

We don't see a large speedup for the smaller models as the parsing is likely a small % of the runtime for those cases. But for the 6 and 7-state cases, we see a 9.9x and 10.9x, respectively.

Changes

Status

nawtrey commented 2 months ago

I ran the KDA manuscript code (see this comment) on this branch and it completed without issue. I'll go ahead and merge this so we can start taking advantage of the performance benefits.