Becksteinlab / kda

Python package used for the analysis of biochemical kinetic diagrams.
GNU General Public License v3.0
3 stars 1 forks source link

ENH: Add `KineticModel` method for calculating net cycle fluxes #57

Open nawtrey opened 2 years ago

nawtrey commented 2 years ago

At the moment it is difficult to calculate operational fluxes because each cycle has to have its direction chosen "by hand". For example, the kda.calculations functions calc_thermo_force, calc_pi_difference, calc_net_cycle_flux, etc. require the user to input the "order" of the cycle, which is a deliberately ordered pair of nodes which is compared to the input cycle to determine which direction is the "forward" cycle direction. While this method allows for complete control over the cycle direction, it is very cumbersome and requires the user to manually record node pairs.

A better solution might be to create a function that operates on the input diagram G which goes through each cycle and labels them as "CCW" or "CW" (or some other naming convention) based on the node positions and a user-defined direction. That way a user would only be required to specify the "positive" direction once, and if they attempted to use a function that requires these to be labeled an error could be raised. The inputs to this function would likely be a dictionary containing the information of any ions/substrates that are being transported. For example, dict("H+"=(0, 1)) would mean a proton is transported in the positive direction through the edge (0, 1).

With this information, something like the following could be performed:

  1. store all unique dict keys (substrate names) in the diagram object
  2. iterate over all cycles
  3. check if each cycle contains any of the edges (or their reverse):
    • if the cycle contains only 1 edge (either forward or reverse), label it according to the prescribed direction
    • if the cycle contains no forward/reverse edges, label None
    • if the cycle contains pairs of forward/reverse edges for the same species, label None
    • if the cycle contains a combination of edges from multiple species, add unique labels for each species (based on direction)

We might have to temporarily store all of the "positive" and "reverse" edge tuples for each cycle but hopefully that can be avoided.

This is probably a naïve solution, but storing this data along with the cycle data (maybe even in a new object) will help prevent user errors using the current method, and allow for easier cycle verification before calculations are performed. It might even warrant a special plotting function that draws each cycle and (where applicable) a CW/CCW arrow for each substrate it transports. That would make it very simple to double check things by eye.

nawtrey commented 2 years ago

One consideration is for cycles that move 2 protons (for example), we would likely need to store that value. So it would really be nice to store an integer for each species in each cycle in the range [-2, 2] which indicates its direction and magnitude. So a -2 would be moving 2 substrates in the CW direction, and a +1 would move a single substrate in the CCW direction. Of course this would require that we define "CCW is positive" for all cycles, which should be avoided I think. I'll have to think about this more.

nawtrey commented 5 months ago

I think there are 2 issues here that are being conflated:

  1. how to assign directions to the cycles in a self-consistent manner.
  2. how to attribute a sign to each cycle flux when calculating other fluxes, like operational fluxes

The first issue is easy to handle -- all cycles should follow the same convention (i.e. CCW is positive). This is how it is done in the literature, and I think it is overall a simpler strategy.

The second issue is a little more complicated, and I think that is what I was addressing in previous comments. When generating flux expressions from net cycle flux expressions, some net cycle fluxes will require a negative sign based on the relative orientation of their corresponding cycles. For example, in the following diagram

1 ------- 2
|    a    |
3 ------- 4
|    b    |
5 ------- 6

the net transition flux for 3->4 is J_34 = J_a - J_b. Thus, cycle b requires a negative sign attributed to it. This is fairly common for operational flux calculations, and a programmatic solution would be desirable. Perhaps networkx offers a solution for something like this but for now it is just an idea. The alternative is to just require the user to input more graph information up front which wouldn't be a bad solution if the amount of required info was kept to a reasonable level.

nawtrey commented 2 weeks ago

With GH-83 merged, I think this issue can change direction slightly, and focus primarily on adding a new KineticModel method that allows the user a simple interface for calculating net cycle fluxes.

The difficulties discussed above are still relevant, as well as the following comment from #68:

...when generating the net cycle fluxes, you need to know the "order" (i.e. direction) for each cycle (see issue https://github.com/Becksteinlab/kda/issues/57), which is input as a pair of nodes in the cycle. Then once the net cycle fluxes are generated, there is no real way of linking one cycle to the next, making it difficult to generate the operational fluxes for a given system. If we can find a way to input more data about the system (or create & store it in the diagram object) up front, we could make this process a lot easier.

Solutions have been discussed in GH-83 (here and here), where the latter is probably more tractable. I'll paste it here for convenience:

Cycle/Operational Flux Solution 2

Perhaps we can require the user to submit the node positions for their graph at creation time as well as a CW/CCW choice.

This provide us a few benefits:

  1. The node positions can be used to determine the correct cycle order for a given cycle since the cycles are basically just sets of nodes where their positions create polygons.
  2. Since we can use the node positions to determine if a cycle is CW/CCW, the user would never need to input cycle "order".

This would require a little more info up front, but if the user ever wanted to graph their diagram (very likely) they would need this information anyways. It would also be readily available now that it is stored in the graph nodes (e.g. pos = nx.get_node_attributes(G,'pos')). The main caveat is that whatever algorithm we use for determining CW/CCW cycles would have to be able to handle a range of polygons including both concave and convex polygons. It looks like there are simple ways to determine this (see here, and particularly this answer).

Now, this feature is only useful in the context of summing net cycle fluxes to create operational flux expressions. To fully automate this procedure, we would need to accompany the CW/CCW cycle handler function with a function that can assign meaning and labels to cycles. Perhaps register_cycles() could simply allow a user to store 1. the cycle node list, 2. the cycle label (e.g. "1", "2", etc.) and 3. the ligand turnover information (e.g. "H", "D", "HD").

With both of these functions implemented (CW/CCW cycle function, register cycles function) a user could do something like the following:

# minimum required diagram info
K = np.array(
    [
        [0, 1, 0, 1],
        [1, 0, 1, 1],
        [0, 1, 0, 1],
        [1, 1, 1, 0],
    ]
)
pos = {0: [1, 1], 1: [-1, 1], 2: [-1, -1], 3: [1, -1]}

# create the model
model = kda.KineticModel(K, G=None, pos=pos, CCW=True)
model.build_cycles()

# register the cycles
cycle_labels = ["a", "b", "c"]
cycle_substrates = [["R"], ["R", "L"], ["L"]]
for cycle, label, substrate in zip(model.cycles, labels, cycle_substrates):
  model.register_cycle(cycle=cycle, label=label, substrate=substrate)

# calculate the operational fluxes
J_L = model.operational_flux(substrate="L", symbolic=True)
J_R = model.operational_flux(substrate="R", symbolic=True)

Behind the scenes, KineticModel.operational_flux() would take the sum of the correct net cycle fluxes by

  1. collecting the cycles that contribute to the flux of L or R (using the registered cycles info),
  2. determining the node order for each contributing cycle according to the chosen CCW=True convention (using the node positions/polygon algo),
  3. finding the net cycle flux for each cycle using existing methods, and
  4. taking the sum of the net cycle fluxes.

I like this solution a fair amount more because it requires less information up front than manually entering all cycles, the info it needs is already required for plotting diagrams, and it is more inline with the theory as desribed by Hill (where you choose a cycle convention up front). It also avoids the risk of upstream cycle-related changes resulting in issues for KDA users since the cycle directions will not rely on upstream code, making it easier to reproduce identical results across users and time (this has been an issue in the past, particularly for kinetic diagrams with multiple Hamiltonian cycles).

nawtrey commented 5 days ago

More info regarding cycle directions --

A simple way to calculate the orientation of a cycle in a 2D plane is the shoelace formula. There's a nice blog that discusses it too. This answer on StackOverflow covers a potentially simpler calculation, which is explained here.

NetworkX also has a class for planar graphs (i.e., graphs that can be drawn in 2D such that edges only intersect at vertices) that has methods which can had "half edges" with CW or CCW orientations labeled. I think the requirement that edges cannot intersect may be overly restrictive, but we may be able to implement a similar system for KDA where edges are initialized with a direction attribute (either "CW" or "CCW"), this would just have to be done in a consistent manner.