Becksteinlab / kda

Python package used for the analysis of biochemical kinetic diagrams.
GNU General Public License v3.0
3 stars 1 forks source link

ENH: Update KDA partial and directional diagram generation algorithm #22

Closed nawtrey closed 2 weeks ago

nawtrey commented 3 years ago

Problem

The primary limitation for graph complexity is the space complexity of functions like kda.diagrams.generate_partial_diagrams() and kda.diagrams.generate_directional_partial_diagrams(). This is inherently related to issue #2, where this issue is focused on one part of that issue.

This problem is 2 fold:

  1. the algorithm for generating partial diagrams is inefficient
  2. the functions store more information than they need to

While 2. does contribute to the issue, the main issue is 1., especially for simpler input graphs. At the moment, the partial diagrams are generated by finding every possible combination of N-1 edges then discarding subgraphs that don't fit the definition of a partial diagram. Complete input graphs contain N(N-1)/2 unique edges, and thus have N^(N-2) partial diagrams (after doing the appropriate combinatorics). These yield approximately N! possibilities (using Stirling's approximation), so generating all possibilities is probably appropriate. But for simpler input graphs the number of partial diagrams is much less numerous so generating every possibility is likely not necessary.

Solution(s)

Before I list the possible solutions, I have to mention that in graph theory a "partial diagram" is known as a "spanning tree", so the solutions below will be using the graph theory language.

There are many solutions to this problem, many of which have formulas for calculating the space complexity of the algorithm. At the moment it isn't obvious which method is the best method, so I'm just going to dump links below.

I was going to list more, but the other articles/papers I have are all references in the above paper so for now I will leave it here.

Tasks

nawtrey commented 3 years ago

Here are some links where people have discussed algorithms for collecting all spanning trees:

nawtrey commented 1 month ago

A few more papers to look at for ideas:

nawtrey commented 3 weeks ago

With https://github.com/Becksteinlab/kda/commit/61a1fcbd0aab22f0075a1438e0977af32ed1918f merged I've checked off the directional diagram task.

nawtrey commented 2 weeks ago

Upon further consideration I don't know if generate_partial_diagrams is really a rate-limiting step for KDA right now.

For example, using a maximally-connected 7-state diagram, the current algorithm generated the 16807 partial diagrams in roughly 1 second:

[70.83%] ··· ...alDiagrams.peakmem_generate_partial_diagrams                ok
[70.83%] ··· ============== ======= =======
             --               return_edges
             -------------- ---------------
                 graph        True   False
             ============== ======= =======
                3-state      80.3M   80.3M
              Hill-5-state   80.4M   80.4M
              Hill-8-state   80.6M   80.5M
              EmrE-8-state   80.6M   82.1M
              Max-7-state    81.3M    138M
             ============== ======= =======

[75.00%] ··· ...rtialDiagrams.time_generate_partial_diagrams                ok
[75.00%] ··· ============== ============= ============
             --                    return_edges
             -------------- --------------------------
                 graph           True        False
             ============== ============= ============
                3-state        225±9μs      215±3μs
              Hill-5-state     459±10μs     435±20μs
              Hill-8-state   2.69±0.02ms   2.94±0.2ms
              EmrE-8-state     16.7±1ms    15.3±0.1ms
              Max-7-state     1.07±0.05s    960±10ms
             ============== ============= ============

Comparatively, the new directional diagrams algo requires ~13 seconds to generate both the partial diagrams and directional diagrams:

[54.17%] ··· ...agrams.peakmem_generate_directional_diagrams                ok
[54.17%] ··· ============== ======= =======
             --               return_edges
             -------------- ---------------
                 graph        True   False
             ============== ======= =======
                3-state      80.4M   80.5M
              Hill-5-state   80.7M   80.9M
              Hill-8-state   80.9M   83.5M
              EmrE-8-state   82.5M    102M
              Max-7-state     152M    818M
             ============== ======= =======

[58.33%] ··· ...lDiagrams.time_generate_directional_diagrams                ok
[58.33%] ··· ============== ============ ============
             --                    return_edges
             -------------- -------------------------
                 graph          True        False
             ============== ============ ============
                3-state      623±100μs    791±100μs
              Hill-5-state   2.82±0.2ms   4.14±0.4ms
              Hill-8-state   23.0±0.4ms    46.7±2ms
              EmrE-8-state    179±8ms      328±20ms
              Max-7-state    8.28±0.04s   12.7±0.07s
             ============== ============ ============

So while it may be good to improve the partial diagram algorithm for other reasons, performance is probably not a main concern. If I can find any way to improve partial diagram algorithm performance I will probably close this issue.