ENH: Optimize state probability expression parsing

Description

I found a rather obvious performance optimization regarding expression parsing with sympy. At a high level, we were parsing the same terms in state probability expressions multiple times. This was due to expressions being normalized before parsing, so sympy was parsing each state multiplicity + the normalization expression (the sum of all multiplicity expressions) for each state.

For a diagram with N states, there are N*P terms, where P is the number of partial diagrams. This means we have to parse N*N*P terms to generate all probability expressions. With the current algorithm, we are parsing N*(N*P + N*N*P) terms, which means we are parsing N+1x the number of terms we need to.

To quantify the real world performance difference, I wrote a short script which times the generation of symbolic state probability expressions:

Testing script

```python import os import time import pickle from kda import calculations def load_pickled_graph(path): """ Loads a pickled `NetworkX.MultiDiGraph` object """ with open(path, 'rb') as f: G = pickle.load(f) print(f"Loaded {os.path.basename(path)} -- {G}") return G def main(): graph_paths = [ os.path.abspath("./graph_3_1.pk"), os.path.abspath("./graph_4_1.pk"), os.path.abspath("./graph_5_1.pk"), os.path.abspath("./graph_6_9.pk"), os.path.abspath("./graph_7_17.pk"), ] graphs = [] times = [] for path in graph_paths: G = load_pickled_graph(path=path) start = time.time() prob_exprs = calculations.calc_state_probs(G=G, key="name", output_strings=True) runtime = time.time() - start graphs.append(os.path.basename(path)) times.append(runtime) print(f"\nRuntime to generate all state probability expressions") print("-" * 40) for g, t in zip(graphs, times): print(f"{g}: {t:.3f} s") if __name__ == "__main__": main() ```

Comparing the runtimes on master vs. optimize_prob_expr_parsing:

Graph             master      optimize_prob_expr_parsing
--------------------------------------------------------
graph_3_1         0.094 s           0.062 s
graph_4_1         0.047 s           0.047 s
graph_5_1         0.375 s           0.145 s
graph_6_9         8.274 s           0.837 s
graph_7_17       18.990 s           1.743 s

We don't see a large speedup for the smaller models as the parsing is likely a small % of the runtime for those cases. But for the 6 and 7-state cases, we see a 9.9x and 10.9x, respectively.

Changes

Changes the way state multiplicity expressions are summed by first parsing them with sympy, then combining them using sum(). This way we only have to parse each term a single time instead of N+1 times. This offers a significant speedup for complex diagrams, especially those with many states.

Status

[x] Ready to go

Becksteinlab / kda