ReactionMechanismGenerator / RMG-Py

Python version of the amazing Reaction Mechanism Generator (RMG).
http://reactionmechanismgenerator.github.io/RMG-Py/
Other
400 stars 228 forks source link

Next Generation RMG Ideas (previously from Wiki) #2731

Open jonwzheng opened 2 weeks ago

jonwzheng commented 2 weeks ago

As we are moving to deprecate the developer wiki, we found a list of suggestions for the long-term development of RMG-Py. For posterity's sake, and to revive discussion, they are listed below (some information may be outdated as the last update was in 2019):

Input/Output

Design a custom file format for inputting/outputting mechanisms, in replacement of reaction/thermo libraries, Chemkin files, and Cantera files. The requirements and goals of such a new format would include the following:

It should fulfill the following roles:

Note: much of this has been recently proposed, see #2634

Database organization

Kinetics is currently estimated by using rate rules, and by converting training reactions into rate rules. Ideally, we would want to be able to change how the estimation is conducted without losing the kinetics data. This would mean storing the kinetics data in a format which is semi-separate from the estimation technique, but can be used to derive the estimation parameters. Rate rules store specific groups which must exist in the current tree structure, which prevents switching to better estimation methods since the data is tied to the method. Training reactions are better, but involve labeling specific atoms based on the reaction recipe (which may also change).

Note: potentially could be addressed in #2708

Order of reaction generation and kinetics estimation

Currently in RMG, reaction rates are estimated after processNewReactions, which was designed to prevent generating kinetics of reactions already generated by RMG. This structure might not be ideal since generating kinetics is not time intensive and the percentage of reactions removed between generation and kinetics estimation might not be that large. If these two assumptions are true (which have not been checked), the code should estimate kinetics during or right after reaction generation.

There is also an idea that reactions that are minor branching ratios could be removed after the reaction generation step. This benefit would be made significantly easier if the reaction generation occurred right after kinetics generation.

Model enlargement

Ideally, different methods to enlarge models should be modular in design. For example, a change to how a flux-based algorithm is conducted should not cause an error if the user wants to use a different generation method (like obtaining all first and second generation products). This would require a well-thought out inheritance structure for model growth.

Currently there is one class for model enlargement, one class for solver, and these are closely tied in with the main algorithm in the RMG object.

Kinetics estimation

RMG has an option for pressure dependent kinetics estimation. This has not been codified into a paper, creates undeterminable kinetics errors, and has a remarkably different flow than typical reaction generation. Prof. Green mentioned debugging and writing a paper on it might be useful. Another option to investigate if whether converting a high-pressure limit model to a pdep model after generation has similar accuracy (with less headache). This still needs to be investigated.

Chemical structure representation

Semi-implicit electrons

Electron tracking is currently an important part of representation via resonance structures recipes for reaction generation, but it also creates unnecessary difficulties. This idea would try to make RMG behave more like quantum chemistry software, in that electrons are not explicitly part of the molecular structure. The benefit of this would be reducing each chemical species to a single, unique representation, and the challenge would be to teach RMG more about how electrons work.

Potential implementation:

Parallelization

Currently RMG has a parallelization function, but it actually slows down the process. This is due most likely to transferring memory between CPUs. Getting rid of parallelization or implementing it in a way that shares memory without transmitting the memory ('global memory') are options to improve this method.

Class restructuring

This class has attributes number and mass, which are arrays and just a scalar multiple of each other. Storing these two separately might cause unexpected issues when updated one of them, but having the other be an older version.