KineticPreProcessor / KPP

The KPP kinetic preprocessor is a software tool that assists the computer simulation of chemical kinetic systems
GNU General Public License v3.0
22 stars 11 forks source link

visualization of the chemical mechanism #2

Open RolfSander opened 3 years ago

RolfSander commented 3 years ago

Automatic visualization of the chemical mechanism with graphviz.

jimmielin commented 2 years ago

I just wanted to confirm this is still under the 3.0.0 target - are there any blockers for implementing this within KPP, besides requiring atomic composition to be specified for each species? This could be a nice feature and a demo could be included in the KPP documentation as well. Thanks!

RolfSander commented 2 years ago

The atomic composition is indeed the only reason why I haven't started with the visualization yet. Plotting the whole mechanism makes no sense, you wouldn't be able to see anything in that plot. Currently, my visualization code is very MECCA-specific. It makes plots for several subsets of the reaction mechanism, for example one plot for bromine chemistry and one plot for chlorine chemistry. Even if I start now, it would take some time to make the code independent of MECCA. I have moved the milestone to 4.0.0 now.

obin1 commented 1 year ago

Automatic visualization of the chemical mechanism with graphviz.

I was inspired by this idea so here's a step towards graphviz compatibility: https://github.com/obin1/KPP-fixStoichiom/commit/37ef030f67f8e58e67e22ce0a484fca5cf8fde41. This addition creates a DOT language file of the mechanism, which once removing the automatically generated header can be used with graphviz like dot -Tpng small_strato_SpeciesReactionGraph.gv > small_strato.png to visualize the mechanism as a bipartite species-reaction graph (see below example for KPP's small_strato). I've used the biadjacency matrix to represent the same type of graph in other work, which is also a format this fork creates, but I think DOT might be a more widely used format for network visualization.

Explicitly including atomic composition could be useful, but I'm not sure it'd be essential for this, isn't stoichiometry already (hopefully) implied in the .eqn file? If we only want to plot a subset/induced subgraph of the reaction mechanism, maybe the specific submechanism could be chosen when writing the files by some sort of mask (could existing tools, like families, be expanded to make this possible)? I know that @emyli19 is working on Python tools to visualize subsets of mechanisms, but this might also be a useful tool directly in KPP. Would people still be interested in something like this?

small_strato

RolfSander commented 1 year ago

Hello @obin1, it's great to see your interest in graphs and mechanism visualization! I think there are many aspects that we can discuss, so I've tried to sort them somewhat...

  1. Graph type:

In my graphs, reactions are always represented as edges, whereas you are generating bipartite species-reaction graphs (with reactions as nodes). I think that both types have their own pros and cons for our purposes. It's good to have code for both!

  1. Technical approach:

You have implemented your additions directly into the KPP C code. My code is independent of the KPP program but it reads the KPP .spc and .eqn input files. I think I will stick to my approach because it allows me to code in Python (which I'm more familiar with) instead of C.

  1. Software:

Indeed, graphviz (dot) is probably the most widely used format for network visualizations. I used to create my own dotfiles with awk, but now I have switched to a python module called graph-tool (https://graph-tool.skewed.de). It can use graphviz under the hood, and in addition it has a large number of graph-theory related tools. For example, it can find the most important chemical pathways from A to B via an Edmonds-Karp algorithm.

  1. Creation of submechanisms (induced subgraphs):

With ever-increasing complexity of atmospheric chemical mechanisms, I think that generating submechanisms will become more and more important. In my code, I can choose between different criteria to define a submechanism, e.g., based on elements, number of carbon atoms, or picking all species involved in the reaction sequence from A to B. Plotting a family as one node instead of showing all family members individually is also a good idea which could be worth implementing.

obin1 commented 1 year ago

Thanks for binning these topics @RolfSander, I took a while to gather some thoughts, here they are

  1. Graph type I agree there are uses for both bipartite and unipartite graphs. However, it's possible to project a bipartite to a unipartite graph, but not always the other way around, so it might be best to start with a bipartite graph. I've found in some recent work that unipartite graphs are good for analyzing overall species-species relationships, but lose insight on reactions unless using a multigraph with an edge for each reaction. This can get messy as unipartite graphs can also split the same reaction into multiple edges between different pairs of species (unless using some sort of hypergraph approach for edges that connect all involved species: at that point, why not just work in the bipartite space?)

  2. Technical approach Not just you -- I know a good amount of people who have written their own parsers for .eqn and .spc files, but it seems like reinventing the wheel especially as .eqn exists as input for an existing parsing tool. I thought it might useful for future users to have this graph compatibility (for mechanism visualization but also other graph applications) directly built into a future version of KPP. What are your thoughts? I'd be happy to contribute to this if you think this would be a worthwhile feature.

  3. Software Thanks for the recommend. We recently moved from igraph to networkx, but will check out graph-tool.

  4. I like these ideas! Tagging @emyli19 who has been developing some submechanism visualization tools in Python, some of these input options might be good to include as keyword arguments at some point

RolfSander commented 1 year ago

Thanks for your comments. A few replies:

1 Graph type

I agree that the bipartite graph is a cleaner way to store all the important information. However, for the visualization I prefer unipartite graphs. Chemists expect to see reactions as arrows (i.e., directed edges) and not as nodes. A suitable approach for us could be to create the bipartite graph as the master file, and then convert to unipartite whenever needed.

2 Technical approach

I don't think that KPP is the right tool to perform any complex graph-related operations. However, it would indeed be a very useful new feature if KPP is able to create a graph that contains the full reaction mechanism from the *.eqn file. A suitable way to save the graph could be the XML-based GraphML format:

https://en.wikipedia.org/wiki/GraphML

http://graphml.graphdrawing.org/

Note that both networkx and graph-tool are able to read and write in GraphML format.

3 Software

I quickly checked the wikipedia page of networkx. It looks like a very nice tool. However, with growing chemical mechanisms, the speed of graph-tool compared to networkx could become important:

https://graph-tool.skewed.de/performance

obin1 commented 1 year ago

To follow up on these:

1) That sounds good, visualization is more intuitive (and potentially less messy) as a unipartite graph, which can be generated from the bipartite graph. 2) I agree that it doesn't make sense to make KPP a network analysis library; there are already several good tools out there. But it is quite straightforward to build into KPP some graph preprocessing functionality while we parse the chemical mechanism. I wrote this initial example for the DOT format, which I found easier to code up in C, but I think both DOT and GraphML are readable and writable by both networkx and graph-tool. My next step is to include reciprocal reactions in the generated DOT format, which are currently left out of the biadjacency matrix but essential for other applications. 3) Good tip -- I might move to graph-tool for some applications that need higher performance.

RolfSander commented 1 year ago

1) OK. I think we can tick off this point. Let's create a bipartite master file.

2) It seems that GraphML is more powerful than DOT. Apparently, edge and node properties can only be strings in DOT:

https://graph-tool.skewed.de/static/doc/quickstart.html#graph-i-o

This would be a severe limitation when I want to add the elemental composition of the species as python dictionaries. Therefore, I prefer GraphML. However, it would be a waste of code not to use the DOT output that you have already written. The solution for us could be to create a new KPP command that eventually will offer both options, e.g.:

#GRAPH OFF (default) #GRAPH DOT
#GRAPH GRAPHML #GRAPH ALL

I found a nice document that describes the different formats for graphs:

https://intranet.icar.cnr.it/wp-content/uploads/2018/12/RT-ICAR-PA-2018-06.pdf

RolfSander commented 1 year ago

Hello @obin1 and @emyli19,

A manuscript describing my mechanism explorer software is now open for discussion:

https://doi.org/10.5194/egusphere-2023-1577

If you have any comments or suggestions, feel free to post a public comment there.

My code takes KPP .spc and .eqn files as input and generates a unipartite graph of the mechanism. If you are still interested in creating a bipartite graph directly via KPP, I'd be more than happy to add a new function that can read your graph into my MEXPLORER software.

obin1 commented 1 year ago

Hi @RolfSander, we were actually just looking at this last week! Looks like a really useful tool, especially the interactive visualization.

Our projects are still using bipartite graphs, either in DOT format for reciprocal reactions or the biadjacency matrix for mass balancing in ML applications. I am using KPP to generate these formulations, so we are definitely interested in use of MEXPLORER. I would first like to clean up the way this is done, including adding the #GRAPH toggle you mentioned. I consider this motivation to clean up the features I've added to KPP :)

Section 2.3.2 is quite relevant to what we are working on: @emyli is quantifying cycles in GEOS-Chem in a bipartite context. I am curious how this is done in a unipartite context in MEXPLORER -- are parallel edges merged for calculation of the net reaction between species?

RolfSander commented 1 year ago

Hello @obin1 and @emyli19,

we were actually just looking at this last week! Looks like a really useful tool, especially the interactive visualization.

Thanks :-)

Let me know if you want to try it and have any questions about the installation or usage!

I would first like to clean up the way this is done, including adding the #GRAPH toggle you mentioned. I consider this motivation to clean up the features I've added to KPP :)

Great! I think the best way to proceed would be to create a separate branch for you in the KPP repo where you can develop and test your code. If the default setting (#GRAPH OFF) has no side effects for other KPP users, we'd be happy to include the new feature in the main KPP distribution.

Section 2.3.2 is quite relevant to what we are working on: @emyli is quantifying cycles in GEOS-Chem in a bipartite context. I am curious how this is done in a unipartite context in MEXPLORER -- are parallel edges merged for calculation of the net reaction between species?

For the visualization, I merge parallel edges that go in the same direction. For the analysis, I'm using a different approach: I create a temporary copy of the graph in which I delete all edges except those for the very fast reactions. This causes the graph to fall apart into several strongly connected components which are detected by graph-tool:

https://graph-tool.skewed.de/static/doc/autosummary/graph_tool.topology.label_components.html

The subgraphs with two or more vertices indicate which species belong to the fast chemical cycles.

obin1 commented 1 year ago

Great, I started a new branch! I was working in one of the flux/family branches, but will move the code to the new branch to isolate the graph generation.