Becksteinlab / kda

Python package used for the analysis of biochemical kinetic diagrams.
GNU General Public License v3.0
3 stars 1 forks source link

WIP, API: Add simple API for KDA #83

Closed nawtrey closed 2 weeks ago

nawtrey commented 5 months ago

Description

Todos

Upstream Todos (handle separately)

~Fix issue #56 -- remove the usage of "name"/"val" dict keys for all kda functions. I don't think this is a particularly useful feature and it will likely require a lot of code to keep it working. Instead we should simply store the edge value under the NetworkX default "weight" key. There is no reason to store "k12" as an edge weight under "name" for edge (0, 1) when the information is already available in the edge tuple. Also, if we do this we can likely remove both graph_utils.generate_edges(G) and graph_utils.retrieve_rate_matrix(K) and simply use the NetworkX built-in functions (e.g. nx.from_numpy_array(K, create_using=nx.MultiDiGraph)).~

Comments

I'm still working out how to handle some of the cycle-based tasks. I need to figure out a good solution for cycles that allows the user to either input the cycle info (i.e. direction, substrate transport per cycle, etc.) or input general state info and automatically determine the cycle info. I haven't come up with anything novel so I'll likely leave that for later.

As it stands the API is already much easier to use than before, especially for transition fluxes (see script below).

Testing/Example Script

Testing Script ```python import numpy as np import kda if __name__ == "__main__": K = np.array( [ [0, 1, 0, 1], [1, 0, 1, 1], [0, 1, 0, 1], [1, 1, 1, 0], ] ) model = kda.KineticModel(K=K, G=None) model.build_cycles() model.build_partial_diagrams() model.build_directional_diagrams() model.build_state_probabilities(symbolic=True) print(model.K) print(model.G) print(model.cycles) print(len(model.partial_diagrams)) print(model.get_partial_diagram_count()) print(len(model.directional_diagrams)) print(model.get_directional_diagram_count()) for (i, j) in model.G.edges(): flux = model.transition_flux(i=i, j=j, net=False, symbolic=True) print(f">>> j_{i}{j} = {flux}") ```

Status

nawtrey commented 5 months ago

Cycle/Operational Flux Solution 1

Regarding the storage of cycle information -- I think an initial solution might be to simply create a method that lets the user "register" a cycle with all of its information. Something like the following might work:

model = kda.KineticModel(K, G=None)
model.register_cycle(identifier="1", cycle=[1, 2, 3, 4], order=[1, 2], substrate=["H"])

Currently the cycles are stored in a KineticModel as a list of lists of ints, but I could find a way to integrate build_cycles() with register_cycle() so they don't store redundant info. For example, I could update build_cycles() to create a dictionary to store all the cycles info and register_cycle() could simply operate on that dictionary as the user registers cycles. For now I think that's what I will do, but eventually it might be worth making a new object for the cycles if there are enough cycle-specific methods and data we want to store.

As an example of what we want to be able to store, we use the following dictionary to store the cycle info for the EmrE analysis:

# define the cycles manually (for repeatability)
EmrE_cycles = {
    1: {"cycle": [0, 1, 3, 2, 4, 5, 7, 6], "order": [6, 0]}, 
    2: {"cycle": [0, 6, 7, 5, 4, 2], "order": [6, 0]}, 
    3: {"cycle": [0, 6, 7, 5, 3, 2], "order": [6, 0]}, 
    4: {"cycle": [0, 6, 7, 5, 3, 1], "order": [6, 0]},
        5: {"cycle": [0, 2, 4, 5, 3, 1, 7, 6], "order": [6, 0]},
    6: {"cycle": [0, 6, 7, 1, 3, 2], "order": [6, 0]}, 
    7: {"cycle": [0, 6, 7, 1], "order": [6, 0]}, 
    8: {"cycle": [0, 2, 3, 1, 7, 5, 4, 6], "order": [6, 0]}, 
    9: {"cycle": [0, 6, 4, 5, 7, 1], "order": [6, 0]}, 
    10: {"cycle": [0, 6, 4, 5, 3, 2], "order": [6, 0]}, 
    11: {"cycle": [0, 6, 4, 5, 3, 1], "order": [6, 0]}, 
        12: {"cycle": [0, 1, 7, 5, 3, 2, 4, 6], "order": [6, 0]},
    13: {"cycle": [0, 6, 4, 2, 3, 1], "order": [6, 0]}, 
    14: {"cycle": [0, 6, 4, 2], "order": [6, 0]}, 
    15: {"cycle": [0, 1, 3, 5, 7, 6, 4, 2], "order": [0, 2]}, 
    16: {"cycle": [0, 2, 4, 6, 7, 1], "order": [0, 2]}, 
    17: {"cycle": [0, 2, 4, 5, 7, 1], "order": [0, 2]}, 
    18: {"cycle": [0, 2, 4, 5, 3, 1], "order": [0, 2]}, 
    19: {"cycle": [0, 2, 3, 5, 7, 1], "order": [0, 2]}, 
    20: {"cycle": [0, 1, 7, 6, 4, 5, 3, 2], "order": [0, 2]},
        21: {"cycle": [0, 1, 3, 2], "order": [0, 2]},
    22: {"cycle": [1, 7, 6, 4, 5, 3], "order": [1, 3]}, 
    23: {"cycle": [1, 7, 6, 4, 2, 3], "order": [2, 4]}, 
    24: {"cycle": [1, 7, 5, 4, 2, 3], "order": [2, 4]}, 
    25: {"cycle": [1, 7, 5, 3], "order": [1, 3]}, 
    26: {"cycle": [2, 3, 5, 7, 6, 4], "order": [2, 4]}, 
    27: {"cycle": [2, 3, 5, 4], "order": [2, 4]},
        28: {"cycle": [6, 7, 5, 4], "order": [7, 5]},
    }

This is very tedious to create and error-prone, but the benefit of having all the info up front is it makes analysis a lot easier. So I think the goal should be to find a way to get all this info into the KineticModel as simply as possible (from a user perspective) so it is a better experience down the road.

nawtrey commented 3 months ago

Cycle/Operational Flux Solution 2

Perhaps we can require the user to submit the node positions for their graph at creation time as well as a CW/CCW choice.

This provide us a few benefits:

  1. The node positions can be used to determine the correct cycle order for a given cycle since the cycles are basically just sets of nodes where their positions create polygons.
  2. Since we can use the node positions to determine if a cycle is CW/CCW, the user would never need to input cycle "order".

This would require a little more info up front, but if the user ever wanted to graph their diagram (very likely) they would need this information anyways. It would also be readily available now that it is stored in the graph nodes (e.g. pos = nx.get_node_attributes(G,'pos')). The main caveat is that whatever algorithm we use for determining CW/CCW cycles would have to be able to handle a range of polygons including both concave and convex polygons. It looks like there are simple ways to determine this (see here, and particularly this answer).

Now, this feature is only useful in the context of summing net cycle fluxes to create operational flux expressions. To fully automate this procedure, we would need to accompany the CW/CCW cycle handler function with a function that can assign meaning and labels to cycles. Perhaps register_cycles() could simply allow a user to store 1. the cycle node list, 2. the cycle label (e.g. "1", "2", etc.) and 3. the ligand turnover information (e.g. "H", "D", "HD").

With both of these functions implemented (CW/CCW cycle function, register cycles function) a user could do something like the following:

# minimum required diagram info
K = np.array(
    [
        [0, 1, 0, 1],
        [1, 0, 1, 1],
        [0, 1, 0, 1],
        [1, 1, 1, 0],
    ]
)
pos = {0: [1, 1], 1: [-1, 1], 2: [-1, -1], 3: [1, -1]}

# create the model
model = kda.KineticModel(K, G=None, pos=pos, CCW=True)
model.build_cycles()

# register the cycles
cycle_labels = ["a", "b", "c"]
cycle_substrates = [["R"], ["R", "L"], ["L"]]
for cycle, label, substrate in zip(model.cycles, labels, cycle_substrates):
  model.register_cycle(cycle=cycle, label=label, substrate=substrate)

# calculate the operational fluxes
J_L = model.operational_flux(substrate="L", symbolic=True)
J_R = model.operational_flux(substrate="R", symbolic=True)

Behind the scenes, KineticModel.operational_flux() would take the sum of the correct net cycle fluxes by

  1. collecting the cycles that contribute to the flux of L or R (using the registered cycles info),
  2. determining the node order for each contributing cycle according to the chosen CCW=True convention (using the node positions/polygon algo),
  3. finding the net cycle flux for each cycle using existing methods, and
  4. taking the sum of the net cycle fluxes.

I like this solution a fair amount more because it requires less information up front than manually entering all cycles, the info it needs is already required for plotting diagrams, and it is more inline with the theory as desribed by Hill (where you choose a cycle convention up front). It also avoids the risk of upstream cycle-related changes resulting in issues for KDA users since the cycle directions will not rely on upstream code, making it easier to reproduce identical results across users and time (this has been an issue in the past, particularly for kinetic diagrams with multiple Hamiltonian cycles).

nawtrey commented 2 months ago

Tests are in now, only 2 lines are uncovered but they are low priority.

With the size of this PR I'm going to shelve the cycle flux considerations for later and focus on wrapping this up so it is available for use.

nawtrey commented 2 weeks ago

Locally the KDA paper code runs without error:

KDA paper code completed in 179.15 seconds

I'll check off the associated task.

nawtrey commented 2 weeks ago

Okay I think this is ready to go at this point. The API is working as intended and already makes things much easier for a user, especially if transition fluxes are needed. The kda manuscript code still runs without issue, all tests pass (with 100% patch coverage) and the doc pages look good. Any further changes should probably be made in targeted PR's for clarity.

I'll open a separate issue regarding the cycle fluxes API and maybe one for plotting as well. For most users, the diagram plotting is probably not crucial but it would be nice to print the kinetic diagram and cycles.