Becksteinlab / kda

Python package used for the analysis of biochemical kinetic diagrams.
GNU General Public License v3.0
3 stars 1 forks source link

MAINT, TST, ENH: Simplify flux diagram generation algo, update directional diagram algo to leverage `nx.all_simple_edge_paths()` #60

Closed nawtrey closed 2 years ago

nawtrey commented 2 years ago
codecov[bot] commented 2 years ago

Codecov Report

Merging #60 (506cd09) into master (4a8e2e1) will decrease coverage by 0.00%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #60      +/-   ##
==========================================
- Coverage   99.86%   99.85%   -0.01%     
==========================================
  Files           9        9              
  Lines         723      708      -15     
==========================================
- Hits          722      707      -15     
  Misses          1        1              
Impacted Files Coverage Δ
kda/diagrams.py 100.00% <100.00%> (ø)
nawtrey commented 2 years ago

Mainly targets generate_flux_diagrams() as it had a lot of low hanging fruit in terms of simplifications, mostly for readability, but also for some performance improvements. In addition to this, the directional diagram algo was changed to leverage a NetworkX built-in path finding function which is likely better tested than the previous methods which are still being used in the flux diagram algo. It would be nice to find a way for the flux diagrams to use this same function, but it proved difficult to locate the appropriate sources and targets. The problem is the "target" is an entire cycle, so it isn't clear how to iterate over sources/targets because while the sources should be leaf nodes, we would have to potentially iterate over every node in a cycle and somehow parse out the "correct" path. There might be a way to treat the entire cycle like a single node, but that would require a clever way to map the original nodes to their appropriate paths, and that isn't obvious either. One step at a time, I suppose.

The good news is I think there is a slight performance increase in both diagram algos, albeit anecdotal. Locally, the test suite on master: 61 passed in 72.79s (0:01:12) and on this branch: 62 passed in 70.59s (0:01:10) This is including the new flux diagram test case, which has a considerable number of flux diagrams to create.

If we crudely time generate_directional_diagrams(), using the script below:

```python import time import numpy as np import networkx as nx from kda import graph_utils, diagrams def get_K(N): K = np.ones((N, N), dtype=np.int32) np.fill_diagonal(K, 0) return K def main(n_nodes, iterations): print(f"Running maximally-connected {n_nodes}-state model") K = get_K(N=n_nodes) G = nx.MultiDiGraph() graph_utils.generate_edges(G, K) times = [] for i in range(iterations): start_time = time.perf_counter() diags = diagrams.generate_directional_diagrams(G) end_time = time.perf_counter() elapsed = end_time - start_time times.append(elapsed) diag_count = len(diags) expected_diag_count = n_nodes ** (n_nodes - 1) assert diag_count == expected_diag_count avg_time = np.mean(times) print(f"Diagrams: {diag_count}") print(f"Avg. time over {iterations} iterations: {avg_time:.4f} s") if __name__ == "__main__": iters = 3 n_nodes = 7 main(n_nodes=n_nodes, iterations=iters) ```

On master:

Running maximally-connected 5-state model
Diagrams; 625
Avg. time over 3 iterations: 0.1198 s

Running maximally-connected 7-state model
Diagrams; 117649
Avg. time over 3 iterations: 34.0590 s

On this feature branch:

Running maximally-connected 5-state model
Diagrams: 625
Avg. time over 3 iterations: 0.1454 s

Running maximally-connected 7-state model
Diagrams: 117649
Avg. time over 3 iterations: 29.9647 s

So there appears to be a cost for smaller test cases (though it's hard to say using this method), but for larger cases there seems to be some benefit. With the switch to storing the partial diagrams (instead of their edge tuples) to generate the directional diagrams, I wouldn't be surprised if the peak memory usage is affected. But as long as we aren't storing the directional diagrams this is probably "okay" for now.

nawtrey commented 2 years ago

The code coverage drop is as a result of there being less lines of code overall, which is a net improvement. So that's not a concern.

nawtrey commented 2 years ago

Alright, everything here LGTM. I've reviewed all the changes again and CI is passing, so it's going in.