Project: Qualitative Analysis

SciML / Catalyst.jl

Chemical reaction network and systems biology interface for scientific machine learning (SciML). High performance, GPU-parallelized, and O(1) solvers in open source software.

https://docs.sciml.ai/Catalyst/stable/

Other

460 stars 77 forks source link

Project: Qualitative Analysis #43

Closed chethega closed 2 months ago

chethega commented 6 years ago

It would be nice to implement various algorithms for qualitative analysis of chemical reaction networks. So, this issue is for brainstorming:

What kind of qualitative analyses would you like to see? Preferably with a link to papers, if anyone has published it before.
What kind of qualitative analyses could you contribute?
What kind of support from the existing framework do we need?

I would plan to implement BF16 (other potential references: OM16, OTM17); in short, this takes a network with qualitative kinetics (we only know stoichiometry, and know on which concentrations each reaction rate depends), and outputs a graph that describes which steady state concentration and steady-state reaction rates depend on which parameters of the system, and also generates some bounds on the kind of possible bifurcations from the steady state. This graph is pretty structured: It has a transitivity property that allow nice visualizations.

I would further plan to implement some variant of "Feinberg Shinar concordance" analysis. I recommend using this as a google term; I know no readable paper on this topic that I can recommend (instead of subjecting any of you to the pain of trying to really understand more than the abstract of this, I'd rather recommend waiting until I post some code).

What kind of things do I need from the network:

I would need to extract, for each reaction, stoichiometry and dependencies (on which concentrations does the reaction rate depend).

Concordance analysis would also require monotonicity information (each reaction rate can depend each concentrations either positively, negatively, with unspecified sign or not at all). Unless we do some extensive interval math, I'd guess that this information would need to be user supplied.

For convenience, one would need a nice way to "clamp" concentrations (remove them from the state-space by considering them constant).

If we want to do some more exotic Feinberg-style analyses which only apply to mass-action kinetics, we would also need to know which reactions are supposed to have strict mass action kinetics. This would need to be a user-supplied parameter: Most of the time, you consider mass-action as an approximation of some unfathomably complicated kinetics; sometimes (rarely) you want to consider it physical/mathematical truth.

If we want the comparison between different variants of networks from section 5 of https://arxiv.org/abs/1606.00279, then we would need some some support from the DSL. Not sure how this should look like.

Afterwards, we would like to visualize the output of the analysis, and store it in some usable data structure. We would further like to use it for the steady state solve, and if parameter-tracking / bifurcation analysis ever becomes a thing in diffeq, then we would like to use it there as well.

For the steady state solver, the result looks like the following: We factorize our big, coupled nonlinear system into a block tridiagonal structure; that is, we have a list of lower-dimensional systems f_i(x_1,..., x_k)==0 (each f_i and x_j is multi-dimensional), and a transitive relation isless, such that f_i(x)=f_i({x_k: isless(k,i)}). Obviously this severely constrains the kind of bifurcations or multiple steady states we can have.

TorkelE commented 6 years ago

I agree that this is a direction in which I would like to move the package (not only being a platform for making simulations, but providing some additional, related, utility).

The reaction_network type already contains a field reactions, this is an vector containing a set of ReactionStruct items. Each ReactionStruct in turn contains the reaction rate (an expression, may be a single number like 2.0, a parameter like k11, or a longer expression possibly containing other reactants, like v1*X/(2.0+X)). This rate is corrected for mass kinetics.

It also contains two arrays of the ReactantStruct structure, one called substrates (the substrates of the reaction) and the other products, the products of the reaction. ReactantStruct contains two fields. The first is a symbol (representing the actual chemical species, like X, what you write in the system), the next is an int and the stoichiometry of that reactant in this reaction, e.g. 2.

So if we have this network: model = @reaction_network rType begin (mm(Z,v,K), 2.0), X + Y ↔ 2Z end v K Then we have a reactions field (an array with two entries). The rates would look like:

v*Z/(K+Z)*X*Y
2.0*Z^2

The first reactions entry would contain a substrate field with two ReactantStruct, the first with reactant field X and the second with reactant field Y. Both would have stoichiometry field 1. The reaction would have a single entry in its products array, a ReactantStruct with reactant Z and stoichiometry would be 2. The second reactions entry would have the same substrates and products, but reversed.

Would it be possible for you to use this array to extract the information you would need? The substrates and products array would give you all the information about stoichiometric change that you would need, and the rate would tell you which reactants the reaction depends on (I would recommend using only the rate field for this. It is possible to declare reactions which does not use the mass kinetics of the substrates for its rate. It is probably very rarely that people will use it, but since all the info is in the rate equation it is better to avoid any risks).

The reaction_network also contains two arrays of symbols, syms and params, containing a list of all reactants and parameters, respectively. This might also by of use.

Lets continue this discussion.

chethega commented 6 years ago

How would I write X --> Y catalyzed by Z? One could either write X + Z --> Y + Z, or one could write f(Z), X --> Y. The former strongly suggests positive catalyzation, while the latter also allows inhibition of the reaction (e.g. by inhibiting some enzyme whose concentration is modeled as a parameter instead of dynamical state). How do I extract the fact that the rate depends on Z?

An obvious approach for purely qualitative analysis would be to have a dummy function that generates the necessary information, in order to write qual(U,V; monotone=(pos, unspec)), X-->Y when we do not want to provide a specific kinetics; and for qualitative analysis one would need to construct such objects from the explicit expressions for the reaction rates.

How do I write a continuous feed into a reactor (the flux would be a parameter)? Naively, I would by tempted to write something like -->X.

Is it reasonably possible to optionally name reactions, with another symbol? That would be quite necessary for readable output that involves reaction rates and not just concentrations.

TorkelE commented 6 years ago

How would I write X --> Y catalyzed by Z? There are several options, e.g. these two are equivalent

k11, X + Z --> Y + Z
k11*Z, X --> Y

I would probably use the second to keep the regulatory term away from the equation. If you want some more complicated regulation you could also use

f(Z), X --> Y

You can extract the fact that the reaction depends on Z by checking the rate expression corresponding to the reaction. Since this expression contains a Z (in all three cases) the reaction depends on Z.

I think we could use two main approaches for this. 1) By providing a large number of ready made functions. E.g. hill functions are already implemented, which mean that you can write

hill(Z,v,K,n), X --> Y
``` v K n
We know how this function work and can have that information in the program. It would probably be both possible and desirable to have a majority of the functions that people want to use already implemented in the program (including information of their monotonicity).
2) If you want to use a custom function you already have to declare it using
```julia
@reaction_func my_hill_repression(x, v, k, n) = v*k^n/(k^n+x^n)   
r = @reaction_network MyReactionType begin
    my_hill_repression(x, v_x, k_x, n_x), 0 --> x             
end v_x k_x n_x

It would be possible to add an optional argument to the @reaction_func macro to designate monotonicity. I think these two cases allow us to handle monotonicity without much problem.

These two options are correct notation:

k1, 0 --> X
k2, ∅ --> X

(just like you can write the reverse for degradation)

Right now reactions do not have a name nor option to add it. It would probably be possible to do at some point if it is needed.

chethega commented 6 years ago

Thanks, so we can really get everything we need!

Giving optional names (a Symbol) to reactions is useful because I'd like the output and network to correspond to the names or numbering schemes used in the source papers (if typing in by hand) or SBML; this just makes talking about networks so much easier.

I see that the list of species is auto-generated from the list of reactions. That makes sense! Sometimes I'd like a fixed order, though. There are some normal forms that depend on the ordering of species and reactions, if we want it to be unique up to identity and not some complicated notion of equivalence.

When I did qualitative analysis in python, I wrote a very simple line-based DSL for chemical reaction networks (with line-based parser and generator, using mostly regexps). I then either typed in my models directly, or I used python-libSBML to read in models from data-bases, and then used the generator to translate these into my DSL (obviously only translating a tiny subset of SBML features). The ordering was done by having an optional "species metadata" section; when reading a file without this section, I auto-generated it from the reaction data; then I wrote back the file, in order to conveniently modify this section by hand (the code is a smoldering unmaintainable abandoned ruin). This way of "pretty printing / normalizing / making optional arguments explicit" and then modifying the normalized file by hand was a nice workflow, though. Not sure whether this workflow can be replicated; I think the DSL can contain arbitrary julia expressions, and I'm not gonna write a parser+generator for julia.

I think the constraints for the DSL are: (1) valid julia expressions, because we piggy-back on the julia parser (2) human readable (which would profit from a little more verbosity) (3) human writable (which means terse and boilerplate-avoiding)

Is this correct?

TorkelE commented 6 years ago

It is true that the species are extracted directly from the reactions, with the result that the order of the species becomes the order in which they appear the the reactions. Right now this only matter if you want them to appear in a specific order for Latexify, but if more features are added there might be more instances were it matters. For the large class of networks that have e.g. degradation for all terms order is easy to fix by beginning the network with a degradation term: r = @reaction_network MyReactionType begin deg, (X,Y,Z) --> 0
end

I am hesitant to e.g. require the user to input such an order by default, since this would require a lot of additional input which would not be used in 90% of all cases.

The best solution would probably be to have a reorder function:

reorder(model,[Z,X,Y])

which orders things as the user want to.

Next concerning the names of reactions. For now it would probably be easiest to have a separate vector with reaction names, which would be inputted to additional function like:

model = @reaction_network MyReactionType begin
    (p,d), X ↔0
end
names = [:prod,:degrad]
graph = makeGraph(model,names)

If we then get these methods in place, in the long run we might think of other ways of inserting names, either as a function or as an optional part of the initiation:

model1 = @reaction_network MyReactionType begin
    (p,d), X ↔0, (prod, degrad)
    X --> Y, activation
end
set_names(model1, [:prod,:degrad,:activation]
model2 = @reaction_network MyReactionType begin
    (p,d), X ↔0, (prod, degrad)
    X --> Y, activation
end

chethega commented 6 years ago

As a third (optional) parameter sounds nice. If we want to be future-safe, maybe it is time to go for keywords (who knows what kind of optional metadata we may want to support in the future). Brainstorming:

@reaction_network MyReactionType begin
   @species_list (X,Z) #ensure that X is ordered before Z.
    (p,d), X ↔0, @names (prod, degrad)
    X --> Y #this one is anonymous in will get e.g. the name anon_reaction3, and the rate will get a parameter anon_parameter3
    Z +Y --> X #also anonymous, will get e.g. the name anon_reaction4 and the rate will get a parameter anon_parameter4
    @parameter_list (p,d) #this ensures that parameter p is listed before parameter d
end

If we now read in this network, and afterwards write it back (from data-structure to text-file), we might get something like:

@reaction_network MyReactionType begin
   @species_list (X,Y, Z) 
   @parameter_list (p,d, anon_parameter3, anon_parameter4)
    (p,d), X ↔0, @names (prod, degrad)
    anon_parameter3, X --> Y, @name anon_reaction3  the name anon_reaction3, and the rate will get a parameter anon_parameter3
    anon_parameter4, Z +Y --> X, @name anon_reaction3   
end

or (potentially also valid!):

@reaction_network MyReactionType begin
   @species_list (Y, X, Z) 
   @parameter_list (anon_parameter3, p, anon_parameter4, d)
    (p,d), X ↔0, @names (prod, degrad)
    anon_parameter3, X --> Y, @name anon_reaction3  the name anon_reaction3, and the rate will get a parameter anon_parameter3
    anon_parameter4, Z +Y --> X, @name anon_reaction3   
end

Personally, I would prefer to spec lexicographic ordering in cases where there is no user-supplied ordering (or where the user-supplied ordering is incomplete, like in the above example), and would prefer to have a specced verbose output format that makes all defaults explicit (and makes x-> parse(write(x)) the identity and x-> write(parse(x)) idempotent). This also helps a lot when debugging networks.

Do we have an ascii synonym for ↔? (asking because one might end up working on DSL files from various languages / tools, and not all of them are convenient with fancy symbols)

TorkelE commented 6 years ago

We might start thinking about keywords, as we are getting ideas for options like that they are probably the way to go. If one want to designate a whole lot of stuff one might get a very long line though, e.g.

model = @reaction_network type=rn_type_3 noise_scaling=Eta species_order=[Z,X,Y] some_other_option= whatever begin
    (p,d), X ↔Y
    k11, X --> Z, 
end p d k11

in that case

model = @reaction_network type=rn_type_3 noise_scaling=Eta begin
    (p,d), X ↔Y
    k11, X --> Z, 
end p d k11
set_species_order(model,[Z,X,Y])
some_other_option(mode,whatever)

would be shorter. I am generally in favour of allowing for as concise a notation as possible. @ChrisRackauckas What do you think of adding implementing keywords and having that as a future system for options? We could declare noise scaling and type that way as well (for now we could leave the old option there to avoid breaking peoples stuff).

@chethega Note that parameters (and their order) is declared at the end.

For now though we can state that adding things like that is doable, either through keyword options or additional functions/macros which can be used on the constructed network. With that cleared out we can begin thinking about the implementation of additional functionality and then if we can accept some of the inputs required for those networks already in the first macro.

(if we have a function

some_cool_feature(model,options)

we can then see if the options can be imputed already in the model declaration and the make

some_cool_feature(model) =  some_cool_feature(model,model.options)

)

chethega commented 6 years ago

Re debugging networks: I encountered the following bug classes: (1) Handwritten models have typos. A normalization scheme that lists species and parameters is quick at catching those. (2) Imported models (also handwritten by copying from papers) often lack necessary degradations that are "obvious" for biologists but prevent steady sates. Often these are not really "reactions", but rather species diffusing away. The same holds for missing feeds. (3) Imported models (also handwritten by copying from papers) often lack sufficient annotations which concentrations are not supposed to be dynamical state. For example: Sure, this reaction produces CO2 as a side-product (god bless import of reaction pathways from databases), so let's get the stoichiometry correct. Ah, but CO2 concentration is really supposed to be a static parameter of the model; and now, we introduced both spurious interactions and, if degradation through diffusion of CO2 is also missing, we also introduced spurious constraints.

I once did a short informal test on the curated section of the biomodels database. I think that a majority of models, when imported from SBML, had bugs. Making debugging of models convenient is important (automated tools for this are out of scope, but preventing extreme pain should be in scope).

So an optional @ignored_species list (empty by default) would be quite nice as well (import the reaction from database, then mix-and-match what I want to include into my model, without having to edit the imported reaction).

TorkelE commented 6 years ago

I am not really sure I understand? You want option to automatically read text files with potentially different notation and make networks of them?

ChrisRackauckas commented 6 years ago

IMO trying to stick everything into the macro is going to be difficult. I think we need to map a lot of the structure that we have to a ModelingToolkit.jl style. Using a structured IR format would be much easier to keep extending. I think @AleMorales has been working on this and it would be nice to check in with where that is. A macro form could be short hand for a much longer detailed form of course.

isaacsas commented 6 years ago

@ChrisRackauckas comment is exactly what I was going to discuss. It seems like what is really needed is a well-documented representation for storing and getting information about a reaction network. (Species, Products, Stoichiometry, rate law type, etc). Then we all can think about building our needed functionality on top of it (I need stuff like knowing for each species, which reactions depend on it, or knowing for each reaction which other reactions have rate laws that should be recalculated when the first reaction occurs...). @chethega probably needs even more info.

Once we've got a nice way for actually storing and accessing basic network info we can start building that added functionality. It would also provide a target for SBML importers and such to output to.

EDIT: Of course, I also think a lot of the functionality we may need may be common. For example reaction dependency graphs. It makes sense or that to be defined on top of the reaction network representation and made available here. It would be redundant for network analysis codes and SSA codes to each create their own routines for calculating dependencies.

chethega commented 6 years ago

model = @reaction_network type=rn_type_3 noise_scaling=Eta begin
    (p,d), X ↔Y
    k11, X --> Z, 
end p d k11
set_species_order(model,[Z,X,Y])
some_other_option(mode,whatever)

is executable code and needs julia.

model = @reaction_network type=rn_type_3 noise_scaling=Eta begin
    (p,d), X ↔Y
    @species_order [X,Z,Y]
    k11, X --> Z, 
    @some_other_option whatever 
end p d k11

is also relatively easy to parse and generate in a decrepit python2.7 script. So I'd guess we want both: Functions to modify models in-memory, and DSL support to describe models in text-files.

You want option to automatically read text files with potentially different notation and make networks of them?

Long term, sure: We can get a model (in IR/struct form) either from the DSL by executing a macro / source file or by programmatically constructing the model, e.g. by using libSBML to painstakingly extract info from databases. For bonus points, we should make a way to output a model (in IR/struct from) in the DSL, because no one wants to read and edit serialized object dumps (that's the point of having a DSL that is actually a file format with specification and not just a julia macro). Having a non-terrible way of outputting models for various other tools, e.g. auto (the bifurcation tracking tool) or the file format/DSL used in the Feinberg chemical reaction network toolbox, is, long term, also a good thing.

Ok, I guess this discussion is getting out of hand: We actually have two topics, (1) qualitative analysis and (2) where to go with in-memory-models, IR and DSL.

@isaacsas What kind of qualitative analyses would you plan to run?

isaacsas commented 6 years ago

Ok, I guess this discussion is getting out of hand: We actually have two topics, (1) qualitative analysis and (2) where to go with in-memory-models, IR and DSL.

That's right. I think the issue is maybe we are now hitting a point with lots of different applications that are all interested in using reaction networks (in different ways). It seems like getting a standardized/documented/queryable representation for the reaction network could help us all in then building our functionality on top of it.

@isaacsas What kind of qualitative analyses would you plan to run?

I'm primarily simulating biological network models as continuous-time jump processes and focusing on the SSAs in DiffEqJump, so more quantitative analysis. My actual research is focused on simulating them as spatial models with both particle diffusion and reaction, codes which I'm hoping to ultimately port over to Julia from C++. Optimized SSAs need just a bit of network info (essentially the dependency graphs I mentioned above).

I haven't worked on qualitative analyses as you describe, but I think as a modeler I would be very interested in having tools that let me easily calculate from just the network structure what kinds of qualitative behavior is possible, especially in the stochastic model context. For example, I know there are results for systems with absolute concentration robustness about the occurrence of extinction events. Are there algorithms for easily determining if a network has absolute concentration robustness? Similarly there are results about the existence of product form stationary distributions in (mass action) networks with zero deficiency, so that is another property I would like to know. I admit though, I haven't looked at such analyses before, so I don't know how easy it is to make a code that can check for these properties.

TorkelE commented 6 years ago

@chethega Maybe I understand, you suggest we should introduce a notation which is not Julia Dependants, or at least where all the functionality can be reached without Julia code?

Now I agree we maybe should structure up the various semi parallel and still dependant discussions/issues that are going on:

Exactly how the macro which creates a reaction_network should look, what information should go into it and how. Also what additional information and if/how one should be able to add more later on. Lets for now settle with that this is a very solvable problem. All the information that we may want/need for various application can eventually be collected in some way. At some point we should probably try and come up with a long term solution for this.
Different, non-simulation, features that we would want to add to the DSL. I agree that it would be fun and useful to tie as much functionality, which is useful when dealing with reaction networks, as possible to this DSL. If we have someone who have a good idea and the means to implement it I support this fully.
What kind of information do we want to put into what kind of structure. Right now we have a reaction_network type containing the stuff required for simulations, with additional stuff added for Latexify and now even more for better SSAs. It makes sense, especially when we start to know what kind of stuff people might want to implement for the point above, to try and coordinate what we put in this type, and exactly in what form.
There is also the issue the ModelingToolkit.jl and how to "Free Ride" as much as possible on its features. I do not know what @AleMorales is/have done on this issue. However, if it hasn't been done at some point I have intended on doing that transfer myself. Last I looked the development of the ModelingToolkit.jl was quite fast paced and I have somewhat lost track on its progress. Right now I am holding of looking into this until some somewhat finished version of ModelingToolkit.jl is released.

ChrisRackauckas commented 6 years ago

ModelingToolkit.jl is pretty much "done" in terms of standard features. We'll be adding things like event handling and components for a fully Modelica compatible IR, but I don't think that we'll need to make use of those since that's more for large DAEs. The rest of what still needs to be worked on are automatic transformations, like DAE index reduction, automatic addition of sensitivity analysis ODEs, etc., but those are things that will just be free features in the future. Even "advanced necessary" features, like Jacobian and inverse Jacobian calculations, already exist. So yes, let's get feedback from @AleMorales on where he is with the reaction-based DSL on it to how we should start designing it.

Just to get everyone up to speed here, what I've noticed with DSLs is that the macro parsing front end isn't the hard or interesting part. The hard part is the internal representation and the setup to analyze and compile equations. ModelingToolkit.jl is an intermediate representation (IR) to hold the equations in a structured form to perform analysis and compilation to different targets. Think of it like a "model compiler" instead of a DSL: it's made so you can stick other DSLs on there.

In there you have AbstractSystems which are composed of equations (represented by Operations) and Variables. So the basic structure of what we'd want for the bio part is we'd want to create a ReactionSystem which takes in an array of reactions (arrow equations) over the reactant and parameter variables. We can have the same macro as we have now allow for parsing more complicated arrow expressions (exactly how to handle the rates choices is not something I know of right now, which is why I am curious if @AleMorales found a good notation). Then that ReactionSystem is a context-aware representation of the interactions.

On systems the interface is to define transformations that are system -> system. So one transformation could be to build and add an interaction network. Another could be to generate the DiffEqSystem (currently only ODEs, but when we get far enough I can quickly add the SDE support). The DiffEqSystem than has its own ways of generating Jacobians etc., so we only need to build that DiffEqSystem and the rest comes for free.

In this format, the equations are stored as a computational graph, so it's all parsable and anything you want is available. If the SBML reader directly writes to this computational graph, then we'd have everything we ever wanted and then some.

TorkelE commented 6 years ago

Good to hear about the ModelingToolkit, I will wait to see what @AleMorales says and we can discuss that further.

What you are saying is that the problem is to decide exactly what we want our reaction_network structure to contain. After that we can figure out how the macro should look like that creates that structure. We could also have several separate macros creating the same structure, which to use depends purpose/preference of the user. Do I understand it right?

So if we move back to the topic what what be good is if people said what kind of "stuff" they would like to implement for a reaction_network, as well as what kind of information they would require to have available in the IR for that to work.

When we know what is good to have in the network we can decide how to represent that information in the IR (as well as how to produce it). We can also see if there are several very similar things which could simply be a single field in the IR.

Finally from there we can start discussing fancy ways of inputting this information through a macro? (Which would probably be a while from now. Can of course discuss it already now as well but would be good to know what we actually need)

ChrisRackauckas commented 6 years ago

What you are saying is that the problem is to decide exactly what we want our reaction_network structure to contain. After that we can figure out how the macro should look like that creates that structure. We could also have several separate macros creating the same structure, which to use depends purpose/preference of the user. Do I understand it right?

Yeah, if we focus on the intermediate representation then the input is then just a choice. SBML, a macro, or a lower-level form. Each can have their own strengths and weaknesses, but build the same objects for analysis.

Finally from there we can start discussing fancy ways of inputting this information through a macro? (Which would probably be a while from now. Can of course discuss it already now as well but would be good to know what we actually need)

I assume it would mostly be the DSL you've already made. That seems to cover everything I've every wanted and more. If we just make the macro write a ReactionSystem then we're good (as long as that ReactionSystem has compilation targets to JumpProblem and transformations to DiffEqSystem, then we get feature-completeness plus the other things we've wanted to fix!).

ChrisRackauckas commented 6 years ago

For the network, I am 90% sure that we'll want to have something that interfaces with LightGraphs.jl. It allows for you to design a graph for its AbstractGraph interface, so we can use that or we directly use https://github.com/JuliaGraphs/MetaGraphs.jl if we don't want to build our own graph type and just stick metadata on it. If we go this route, then we get all of graph IO, analysis functions, visualization, etc. for free, so I'm 90% sure we'll want to do that instead of having our own implementation.

isaacsas commented 6 years ago

@Gaussia moving our discussion here as @ChrisRackauckas requested.

For building a dependency graph; I think it would work best if there is a function to return the dependency graph given a reaction_network. Basically this function could

Build a map from species to reactions that depend on that species (i.e. reactions with rates laws involving that species).
Build a second map from each reaction to other reactions whose rate law might change when the first reaction is executed. This uses the map from step 1.

For a large network it might take some time to construct (I think it would take O(K) time, if there are K reactions).

I can write this function if the reaction_network has the two added fields I mentioned.

TorkelE commented 6 years ago

@isaacsas Sounds good. I will start thinking of making some modifications to what info goes into the ReactionStruct structure, but adding dependencies sound like a good plan.

@ChrisRackauckas Exactly what is the ReactionSystem you mention? Would it be an alternative IR which the DSL creates? Is it related to the ModellingToolkit or link to that Ave is making? Or is it some abstract future extension of the current IR that we are now discussing?

I am very fond of the current DSL :) , it is concise, looks good and easy to read (or at least I think so, but then I have also been using it for a time).

AleMorales commented 6 years ago

At the moment ReactionDSL is in stand-by and has not been updated for a month, partly because ModelingToolkit.jl was such a fast moving target but also due to my own need to focus on my day job. However, I think the basic idea is already fully implemented in ReactionDSL, just need some updating and adding features. I will try to do such update asap and write down a description of the how it all works, but probably you will need to wait until the weekend... Hopefully that would help change the bus factor of that project as I don't have that much time to work on it at the moment.

In the mean time I give some examples below (probably do not work anymore with latest version of ModelingToolkit.jl)

Species are just simple types with optional fields that could stored default values, metadata, compartments, etc.

# Basic elements
@Species s1 s2 s3 = 1
@test s1 == Species(:s1)
@test s3 == Species(:s3, 1)

A formula is just another data structure where species are stored as reactants (which info on stoichiometry)

@Param st
f1 = 2s1 + st*s2 ⟺ s3
f2 = Formula(:⟺, [Reactant(s1, Constant(2)), Reactant(s2, st)],
             [Reactant(s3, Constant(1))])
@test f1 == f2

The type of arrow determines directionality, reversibility, etc. This could map 100% the arrows in DiffEqBiological.jl

When you want to assign the rate expression you just do the following, to create a Reaction:

# An actual reaction
@Param k Keq
r1 = 2s1 + st*s2 ⟺ s3 ~ k*(s1*s2 - s3/Keq)
op = k*(s1*s2 - s3/Keq)
r2 = Reaction(f1, op)
@test r1 == r2

The right hand side is 100% handled by ModelingToolkit.jl.. A whole reaction system would be:

@Param gₓ xₒ kₓ Kₑ vₛ Kₘ
@IVar t
@Species x y
eq = [∅ ⟺ x ~ gₓ*(xₒ - x),
      x ⟺ y ~ kₓ*(x - y/Kₑ),
      y ⟺ ∅ ~ vₛ*y/(Kₘ + y)]
ode1 = ReactionODE(eq, t)
generate_ode_function(ode1)

ode contains a list of Reactions, plus all the inputs and outputs of the model already extracted from the reactions (this would be your ReactionSystem). The generate_ode_function then lowers this data structure into a DiffEqSystem.

Summary

ReactionDSL does not use macros at all, it all relies of defining a series of data types (Species, Reactants, Formula, Reaction) and defining methods for operators (including arrows) to parse the expressions. It makes use of ModelingToolkit.jl as much as possible (so Species behave like Variables and the rate expressions are just Operations).

The limitations I have encountered so far is that (i) operators need to be chosen carefully based on the precedence of parsing and sometimes parentheses cannot be avoided and (ii) algebraic expressions on Species need to be parsed with different meanings (i.e. as a reactant in a formula describing a reaction and as a variable in a rate expression).

Mapping these data structures to graphs should be easy. Indeed, this data structures are kind of similar to SBML and I know with SBML it is possible to map the model description to a graph.

isaacsas commented 6 years ago

@Gaussia Great, thanks!

If you've gotten too much to do between all these discussions I'd be happy to take charge of updating the JumpProblem construction to use both MassActionRates and ConstantRateJumps. I can't see how to do it without a field like has_mass_action_rate_law, but once that is in it should be easy to setup. (You're welcome to handle it yourself if you see an easy way to do it with the current reaction_network structure.)

ChrisRackauckas commented 6 years ago

ReactionDSL does not use macros at all, it all relies of defining a series of data types (Species, Reactants, Formula, Reaction) and defining methods for operators (including arrows) to parse the expressions. It makes use of ModelingToolkit.jl as much as possible (so Species behave like Variables and the rate expressions are just Operations).

Out of curiousity, why use Species instead of a Variable and Reaction instead of an Operation?

Also, am I reading this correctly that ∅ ⟺ x ~ gₓ*(xₒ - x) is ∅ ⟺ x with a rate gₓ*(xₒ - x)?

But yeah, the point is that @Gaussia's macro could lower to the form that @AleMorales is writing about and spit out that ReactionODE (to fit the naming system imposed in ModelingToolkit.jl, I would go with ReactionSystem. That would give a quick form for writing out networks, while the macro-free form is for larger expressions and is more programmable.

(The problem with macros is you have to have the full expression at compile-time, so it's hard to computationally build large networks which is where we want to go. For example, adding a bunch of reactions using a for loop).

ChrisRackauckas commented 6 years ago

Talking with @sbromberger


we want to store chemical reaction networks in AbstractGraphs to give us all of that goody LightGraphs features.

Seth Bromberger [8:31 AM]
nice
ok, so the reason we chose MetaDict is because you can basically put whatever you want in there.
but - I think you might be confused as to where we use it.
MetaDict is *only* used for indices. If you’re not indexing by property, you won’t use it.
```mutable struct MetaGraph{T<:Integer,U<:Real} <: AbstractMetaGraph{T}
    graph::SimpleGraph{T}
    vprops::Dict{T,PropDict}
    eprops::Dict{SimpleEdge{T},PropDict}
    gprops::PropDict
    weightfield::Symbol
    defaultweight::U
    metaindex::MetaDict
    indices::Set{Symbol}
end
if all you need to do is store metadata on vertices or edges, we use a PropDict which is simply `const PropDict = Dict{Symbol,Any}`

Chris Rackauckas [8:35 AM]
I am thinking we just want nodes, edges between things that react with each other, and some metadata that says what the reaction's name is and things like that.

Seth Bromberger [8:35 AM]
are you planning on inducing subgraphs using this metadata?
if not, then you don’t need metaindex.
it sounds like all you really need is eprops and possibly vprops.
does that make sense?

Chris Rackauckas [8:47 AM]
yeah

AleMorales commented 6 years ago

Out of curiousity, why use Species instead of a Variable and Reaction instead of an Operation?

Species behaves as a Variable in the rate equation, but I want to id them more easily inside formula expressions (that may have parameter as stoichiometry). Also, this was design at a time where SciCompDSL did not allow storing a lot of metadata and Species may need some extra metadata (e.g. compartments).

Similarly, Reaction instead of Operation because ReactionDSL supports mixing reactions with equations (e.g. for intermediate variables), so I need to distinguish them as special cases. Also, a Reaction stores a Formula object as one of the arguments.

Also, am I reading this correctly that ∅ ⟺ x ~ gₓ(xₒ - x) is ∅ ⟺ x with a rate gₓ(xₒ - x)?

Yes. How to make use of the rate in the lowered system depends on the type of arrow (open/close, reversible, etc).

(The problem with macros is you have to have the full expression at compile-time, so it's hard to computationally build large networks which is where we want to go. For example, adding a bunch of reactions using a for loop).

Exactly. A third option could be to have macros that facilitate the construction of Reactions and then push! such reaction objects into the ReactionSystem. This could avoid some of the limitations of embedded DSL (i.e. only relying on operators) while still keeping all the advantages of an explicit data structure.

ChrisRackauckas commented 6 years ago

Species behaves as a Variable in the rate equation, but I want to id them more easily inside formula expressions (that may have parameter as stoichiometry). Also, this was design at a time where SciCompDSL did not allow storing a lot of metadata and Species may need some extra metadata (e.g. compartments).

You can make @Species build a :Species variable with a context that holds the appropriate metadata. That's what that field is for, but it probably didn't exist when you first made this. If you make a variable like this then translating to an ODE should be more automatic and gets rid of the

https://github.com/AleMorales/ReactionDSL.jl/blob/master/src/Elements.jl#L44-L47

Similarly, Reaction instead of Operation because ReactionDSL supports mixing reactions with equations (e.g. for intermediate variables), so I need to distinguish them as special cases. Also, a Reaction stores a Formula object as one of the arguments.

Yeah, this is a bit more than just an Operation so it should be its own entity.

https://github.com/AleMorales/ReactionDSL.jl/blob/master/src/Elements.jl#L50-L67

That's a really interesting idea. Is this an AbstractComponent in the sense of Modelica, or is it good to keep those ideas separate?

https://github.com/JuliaDiffEq/ModelingToolkit.jl/issues/36

chethega commented 6 years ago

Ah, ok, I now understand now DSL problem; I failed at terminology. Most of you meant by "DSL" a set of macros or definitions, that allow one to write text files that (1) look like a chemical network (concise, human readable) and (2) that are also executable julia files; when included (in contexts where we are using the requisite packages) they generate a complicated expression tree / datastructure that can compile to numerical code, or be used for qualitative analysis, etc; in short, that encodes the chemical network in a form that is convenient for programmatic access, for symbolic or graph-theoretic or numerical operations.

I also wanted it to (3) conform to a strict and simple file format, such that one could write a julia/python/perl/javascript parser in an afternoon, and (4) allow it to also serve as an output format for programs that generate or manipulate chemical networks. We could skip writing a parser in julia, because of (2), but we should still write an export-to-file for models that were programatically created or modified (or imported from other file formats).

The combination of (4) and (2) automatically gives us a "prettify". Obviously each output file should start with its (optional) file format version, e.g. @ReactionNetwork_file_format 0.1. Handwritten files would typically not contain this and rather default to being parsed in the newest version, because boilerplate sucks; post prettify you would have this as part of the file, so the workflow would be "play in jupyter until the network is good, then export the file"; you can copy-paste from the export into jupyter cells because of (2)), and the archived networks still work when DiffEqBiological.jl updates.

Writing an ABNF spec is probably overkill (and if someone proposes an xml schema I will point at SBML and run away crying), and I would be happy with an informal spec that is readable for people who know nothing about julia. But having a format that is convenient for humans, simple enough that we could write a short ABNF spec, that is expressive enough for the types we have, and is still a small subset of the julia language (as parsed by the femtolisp parts of julialang), would still be a plus.

AleMorales commented 6 years ago

That's a really interesting idea. Is this an AbstractComponent in the sense of Modelica, or is it good to keep those ideas separate?

JuliaDiffEq/ModelingToolkit.jl#36

I think AbstractComponent is supposed to be lowered to one or more equations? Reactions will actually be used in multiple equations. But in the end, the mass balance of a Species is just the combination of reactions, so maybe there is an analogy there, I am not sure.

ChrisRackauckas commented 6 years ago

Yeah, let's be clear about the language. The DSL, or domain-specific language, is a language (in Julia, using the Julia parser) which writes to an intermediate representation (IR). That IR then has IR passes (transformations) and then compiles to byte code (here we use the Julia compiler and build Julia functions and types).

Right now the DSL we have is the @reaction_network DSL:

http://docs.juliadiffeq.org/latest/models/biological.html#The-Reaction-DSL-1

rs = @reaction_network begin
  c1, S + E --> SE
  c2, SE --> S + E
  c3, SE --> P + E
end c1 c2 c3

The pros of a macro-based DSL is it's compact, concise, and easily human readible. The bad part is that, since macros act at compile time, they are hard to utilize programmatically. This DSL has its own internal IR which can split out some problems to solve:

p = (0.00166,0.0001,0.1)
# S = 301, E = 100, SE = 0, P = 0
prob = DiscreteProblem([301, 100, 0, 0], (0.0, 100.0), p)
jump_prob = JumpProblem(prob, Direct(), rs)
sol = solve(jump_prob, Discrete())

The question is where to go next. ModelingToolkit.jl is an intermediate representation for Julia-based DSLs. It somewhat has a natural DSL on it through some basic syntactic sugar.

using ModelingToolkit

# Define some variables
@IVar t
@DVar x(t) y(t) z(t)
@Deriv D'~t
@Param σ ρ β

eqs = [D*x ~ σ*(y-x),
       D*y ~ x*(ρ-z)-y,
       D*z ~ x*y - β*z]
de = DiffEqSystem(eqs)

where instead of those macros we could have done:

t = IndependentVariable(:t)
x = DependentVariable(:x,dependents = [t])
y = DependentVariable(:y,dependents = [t])
z = DependentVariable(:z,dependents = [t])
D = Differential(t) # Default of first derivative, Derivative(t,1)
σ = Parameter(:σ)
ρ = Parameter(:ρ)
β = Parameter(:β)

and then since it's a compilation IR it has transformation functions that spit outputs for compilation, for example:

ModelingToolkit.generate_ode_function(de)

## Which returns:
:((du, u, p, t)->begin
                x = u[1]
                y = u[2]
                z = u[3]
                σ = p[1]
                ρ = p[2]
                β = p[3]
                x_t = σ * (y - x)
                y_t = x * (ρ - z) - y
                z_t = x * y - β * z
                du[1] = x_t
                du[2] = y_t
                du[3] = z_t
            end
        end)

gives a Julia function to compile.

@AleMorales 's ReactionDSL and the @reaction_network aren't necessarily different ideas, they are different levels of the abstraction. The ReactionDSL is actually more of an expansion to the IR specifically for chemical reaction networks. We can keep the macro by having @reaction_network spit out the stuff you would've wanted to write for a ReactionDSL, making it syntactic sugar over a more extendable system.

Since that representation is programmable, SBML parsing is really just another DSL targetting the same IR that @reaction_network would be, so the features would come free. Then spitting out SBML would just be a "compilation target" from the system.

Now let's dissect the flow of this a little bit. To build the IR, you first build your Expressions. Here it only has variables and Operations. Operation is a computational graph built automatically via function dispatches on variables, so D*x ~ σ*(y-x) builds an Operation which basically says "this operation is on the left hand side the derivative of x and on the right hand side is σ*(y-x), which itself is an operation that is .." which is why it's a recursive tree structure. So then

using ModelingToolkit

# Define some variables
@IVar t
@DVar x(t) y(t) z(t)
@Deriv D'~t
@Param σ ρ β

eqs = [D*x ~ σ*(y-x),
       D*y ~ x*(ρ-z)-y,
       D*z ~ x*y - β*z]

is just a bunch of expressions. Then

de = DiffEqSystem(eqs)

puts it all into one object, the AbstractSystem, and then the compilation and analysis is written as transformation on systems.

So for reactions, IMO it looks like we want to extend the IR to have the Reaction type (does Formula need to be separate from Reaction? https://github.com/AleMorales/ReactionDSL.jl/blob/master/src/Elements.jl#L63-L67 ) and then have a ReactionSystem that has transformation functions to other DiffEqSystems, but also has a ::Union{Void,...} field that a graph can be added into if you construct_reaction_network!(rs).

The interesting issue is to build out ModelingToolkit.jl to get feature-parity: SDEs and jumps need to be added to the DiffEqSystem. But that's a separate issue from re-structuring the DSL: if you just build the reaction DSL assuming the rest of the diffeq systems will be handled, the features will come when I get to that (hopefully before the summer).

ChrisRackauckas commented 6 years ago

I think AbstractComponent is supposed to be lowered to one or more equations? Reactions will actually be used in multiple equations. But in the end, the mass balance of a Species is just the combination of reactions, so maybe there is an analogy there, I am not sure.

Yeah it's probably too different.

alanderos91 commented 6 years ago

I'm not an expert on this subject, but I'm throwing this out there for visibility: https://arxiv.org/pdf/1412.5257.pdf

The basic idea: It seems there are some nice results that can identify reaction networks that admit multiple steady states based on the structure of networks. For example, it is possible (for the "right types of networks") to prove that a given reaction network has multiple steady states while a smaller, embedded sub-network cannot produce the same steady states.

The lead authors have additional papers and references on their personal websites regarding multi-stationarity.

chethega commented 6 years ago

I'm not an expert on this subject, but I'm throwing this out there for visibility: https://arxiv.org/pdf/1412.5257.pdf

Yeah, that's a much better survey of what I called Feinberg-style methods; thanks for linking!

Unfortunately, checking qualitative networks for multiple steady states is afaik not easy. That is, whether a subexponential algorithm exists is still an open question; and if such an algorithm exists then it also solves https://arxiv.org/pdf/math/9911268.pdf. Quite frankly, inventing such an algorithm is probably above my paygrade, if it even exists (just implementing the linked algorithm which solves a much simpler problem would probably be quite daunting for me). Of course, there is a trivial exponential time algorithm, of typical fashion for NP problems: There always is a witness that allows fast verification of multi-stationarity; it is unknown whether witnesses for non-multi-stationarity exists; there are "only" exponentially many potential witnesses, and you can proceed by trying all of them.

Luckily in practice, irreducible non-multi-stationar networks should tend not to be too small, and witnesses of their multiple steady states should tend not be too rare, which means that trying out all potential witnesses might not even be catastrophically bad. (not sure whether the witnesses of multi-stationarity are published anywhere; I'll probably post an unpolished preprint on the arxiv along with the code here)

isaacsas commented 5 years ago

@chethega Is this still something you are interested in? I think we've reached a point where most of the information you want is available, so we could easily add a simply API for querying and getting it out. With the new min_reaction_network macro, we can handle tens of thousands of reactions in a few seconds, so larger networks are now possible. For example:

What kind of things do I need from the network:

I would need to extract, for each reaction, stoichiometry and dependencies (on which concentrations does the reaction rate depend).

Given a rn = @reaction_network... or @min_reaction_network we can extract this info pretty easily for the ith reaction as:

# vector of ReactantStruct storing species symbol and stoichiometry coefficient 
rn.reactions[i].substrates 

# vector of ReactantStruct storing species symbol and stoichiometry coefficient 
rn.reactions[i].products

# vector of species the reaction law depends on 
rn.reactions[i].dependents    # dependent species symbols

# ODE rate law expression
rn.reactions[i].rate_DE

# true if pure mass action reaction
rn.reactions[i].is_pure_mass_action

We also have functionality for making reaction-reaction dependency graphs, and substrate-reaction graphs.

Concordance analysis would also require monotonicity information (each reaction rate can depend each concentrations either positively, negatively, with unspecified sign or not at all). Unless we do some extensive interval math, I'd guess that this information would need to be user supplied.

For mass action reactions and the different Hill/MM functions we support this is easy to extract. For more general rates you'd probably want user input as you mention.

If we want to do some more exotic Feinberg-style analyses which only apply to mass-action kinetics, we would also need to know which reactions are supposed to have strict mass action kinetics. This would need to be a user-supplied parameter: Most of the time, you consider mass-action as an approximation of some unfathomably complicated kinetics; sometimes (rarely) you want to consider it physical/mathematical truth.

As I said above, this is there now.

Afterwards, we would like to visualize the output of the analysis, and store it in some usable data structure. We would further like to use it for the steady state solve, and if parameter-tracking / bifurcation analysis ever becomes a thing in diffeq, then we would like to use it there as well.

This is something that would be great, but I don't understand what the standard approaches are. People have posted about "visualizing" reaction networks, but it seems like there are many different networks one could visualize. Is there a canonical reaction network graph that people mean? How is it, or the output of these types of analyses on it, commonly visualized?

isaacsas commented 2 months ago

Qualitative analysis is progressing and we can continue discussion in more concrete issues for specific functionality.