I found a rather obvious performance optimization regarding expression parsing with sympy. At a high level, we were parsing the same terms in state probability expressions multiple times. This was due to expressions being normalized before parsing, so sympy was parsing each state multiplicity + the normalization expression (the sum of all multiplicity expressions) for each state.
For a diagram with N states, there are N*P terms, where P is the number of partial diagrams. This means we have to parse N*N*P terms to generate all probability expressions. With the current algorithm, we are parsing N*(N*P + N*N*P) terms, which means we are parsing N+1x the number of terms we need to.
To quantify the real world performance difference, I wrote a short script which times the generation of symbolic state probability expressions:
Testing script
```python
import os
import time
import pickle
from kda import calculations
def load_pickled_graph(path):
"""
Loads a pickled `NetworkX.MultiDiGraph` object
"""
with open(path, 'rb') as f:
G = pickle.load(f)
print(f"Loaded {os.path.basename(path)} -- {G}")
return G
def main():
graph_paths = [
os.path.abspath("./graph_3_1.pk"),
os.path.abspath("./graph_4_1.pk"),
os.path.abspath("./graph_5_1.pk"),
os.path.abspath("./graph_6_9.pk"),
os.path.abspath("./graph_7_17.pk"),
]
graphs = []
times = []
for path in graph_paths:
G = load_pickled_graph(path=path)
start = time.time()
prob_exprs = calculations.calc_state_probs(G=G, key="name", output_strings=True)
runtime = time.time() - start
graphs.append(os.path.basename(path))
times.append(runtime)
print(f"\nRuntime to generate all state probability expressions")
print("-" * 40)
for g, t in zip(graphs, times):
print(f"{g}: {t:.3f} s")
if __name__ == "__main__":
main()
```
Comparing the runtimes on master vs. optimize_prob_expr_parsing:
Graph master optimize_prob_expr_parsing
--------------------------------------------------------
graph_3_1 0.094 s 0.062 s
graph_4_1 0.047 s 0.047 s
graph_5_1 0.375 s 0.145 s
graph_6_9 8.274 s 0.837 s
graph_7_17 18.990 s 1.743 s
We don't see a large speedup for the smaller models as the parsing is likely a small % of the runtime for those cases. But for the 6 and 7-state cases, we see a 9.9x and 10.9x, respectively.
Changes
Changes the way state multiplicity expressions are summed by first parsing them with sympy, then combining them using sum(). This way we only have to parse each term a single time instead of N+1 times. This offers a significant speedup for complex diagrams, especially those with many states.
I ran the KDA manuscript code (see this comment) on this branch and it completed without issue. I'll go ahead and merge this so we can start taking advantage of the performance benefits.
Description
I found a rather obvious performance optimization regarding expression parsing with
sympy
. At a high level, we were parsing the same terms in state probability expressions multiple times. This was due to expressions being normalized before parsing, sosympy
was parsing each state multiplicity + the normalization expression (the sum of all multiplicity expressions) for each state.For a diagram with
N
states, there areN*P
terms, whereP
is the number of partial diagrams. This means we have to parseN*N*P
terms to generate all probability expressions. With the current algorithm, we are parsingN*(N*P + N*N*P)
terms, which means we are parsingN+1
x the number of terms we need to.To quantify the real world performance difference, I wrote a short script which times the generation of symbolic state probability expressions:
Testing script
```python import os import time import pickle from kda import calculations def load_pickled_graph(path): """ Loads a pickled `NetworkX.MultiDiGraph` object """ with open(path, 'rb') as f: G = pickle.load(f) print(f"Loaded {os.path.basename(path)} -- {G}") return G def main(): graph_paths = [ os.path.abspath("./graph_3_1.pk"), os.path.abspath("./graph_4_1.pk"), os.path.abspath("./graph_5_1.pk"), os.path.abspath("./graph_6_9.pk"), os.path.abspath("./graph_7_17.pk"), ] graphs = [] times = [] for path in graph_paths: G = load_pickled_graph(path=path) start = time.time() prob_exprs = calculations.calc_state_probs(G=G, key="name", output_strings=True) runtime = time.time() - start graphs.append(os.path.basename(path)) times.append(runtime) print(f"\nRuntime to generate all state probability expressions") print("-" * 40) for g, t in zip(graphs, times): print(f"{g}: {t:.3f} s") if __name__ == "__main__": main() ```Comparing the runtimes on
master
vs.optimize_prob_expr_parsing
:We don't see a large speedup for the smaller models as the parsing is likely a small % of the runtime for those cases. But for the 6 and 7-state cases, we see a 9.9x and 10.9x, respectively.
Changes
sympy
, then combining them usingsum()
. This way we only have to parse each term a single time instead ofN+1
times. This offers a significant speedup for complex diagrams, especially those with many states.Status