As we are moving to deprecate the developer wiki, we found a list of suggestions for the long-term development of RMG-Py. For posterity's sake, and to revive discussion, they are listed below (some information may be outdated as the last update was in 2019):
Input/Output
[ ] Custom RMG mechanism format
Design a custom file format for inputting/outputting mechanisms, in replacement of reaction/thermo libraries, Chemkin files, and Cantera files. The requirements and goals of such a new format would include the following:
Move away from using Python syntax and loading via an exec type command.
Maintain reasonable human readability
Include RMG generated metadata in a way that can be easily read by RMG
It should fulfill the following roles:
Reaction and thermo libraries
Seed mechanisms
Restart files
Input for RMG based post-processing including
Uncertainty/sensitivity analysis
Mechanism reduction
Conversion to other file formats such as Chemkin or Cantera
Robust metadata storage would be a key feature of the new format, which is extremely fragile in all of the current formats which rely on comment writing and parsing.
Note: much of this has been recently proposed, see #2634
Database organization
[ ] Kinetics
Kinetics is currently estimated by using rate rules, and by converting training reactions into rate rules. Ideally, we would want to be able to change how the estimation is conducted without losing the kinetics data. This would mean storing the kinetics data in a format which is semi-separate from the estimation technique, but can be used to derive the estimation parameters. Rate rules store specific groups which must exist in the current tree structure, which prevents switching to better estimation methods since the data is tied to the method. Training reactions are better, but involve labeling specific atoms based on the reaction recipe (which may also change).
Note: potentially could be addressed in #2708
Order of reaction generation and kinetics estimation
[ ] Estimate rates during reaction generation\
Currently in RMG, reaction rates are estimated after processNewReactions, which was designed to prevent generating kinetics of reactions already generated by RMG. This structure might not be ideal since generating kinetics is not time intensive and the percentage of reactions removed between generation and kinetics estimation might not be that large. If these two assumptions are true (which have not been checked), the code should estimate kinetics during or right after reaction generation.
There is also an idea that reactions that are minor branching ratios could be removed after the reaction generation step. This benefit would be made significantly easier if the reaction generation occurred right after kinetics generation.
Model enlargement
[ ] Make model enlargement modular
Ideally, different methods to enlarge models should be modular in design. For example, a change to how a flux-based algorithm is conducted should not cause an error if the user wants to use a different generation method (like obtaining all first and second generation products). This would require a well-thought out inheritance structure for model growth.
Currently there is one class for model enlargement, one class for solver, and these are closely tied in with the main algorithm in the RMG object.
Kinetics estimation
[ ] Triple-check P-dep performance and potentially write into paper
RMG has an option for pressure dependent kinetics estimation. This has not been codified into a paper, creates undeterminable kinetics errors, and has a remarkably different flow than typical reaction generation. Prof. Green mentioned debugging and writing a paper on it might be useful. Another option to investigate if whether converting a high-pressure limit model to a pdep model after generation has similar accuracy (with less headache). This still needs to be investigated.
Chemical structure representation
Semi-implicit electrons
[ ] Represent electrons semi-implicitly
Electron tracking is currently an important part of representation via resonance structures recipes for reaction generation, but it also creates unnecessary difficulties. This idea would try to make RMG behave more like quantum chemistry software, in that electrons are not explicitly part of the molecular structure. The benefit of this would be reducing each chemical species to a single, unique representation, and the challenge would be to teach RMG more about how electrons work.
Potential implementation:
Eliminate bond orders. Molecular graph would only indicate basic connectivity of atoms.
Introduce more detailed atom electron attributes, which would indicate electrons in pi bonds, electrons not in bonds (radicals), and lone pairs.
For reaction recipes, bond formation would entail decrementing one of the electron attributes, followed by updating the electron status of the other atoms in the molecule if necessary.
Aromaticity would then be handled as a separate flag for both atoms and bonds, based on whatever perception algorithm is used.
This representation could be converted to a standard representation by perceiving bonds based on the number of bonding electrons in adjacent atoms.
Parallelization
[ ] Implement real parallelization
Currently RMG has a parallelization function, but it actually slows down the process. This is due most likely to transferring memory between CPUs. Getting rid of parallelization or implementing it in a way that shares memory without transmitting the memory ('global memory') are options to improve this method.
Class restructuring
[ ] Rework Conformer.pyx
This class has attributes number and mass, which are arrays and just a scalar multiple of each other. Storing these two separately might cause unexpected issues when updated one of them, but having the other be an older version.
As we are moving to deprecate the developer wiki, we found a list of suggestions for the long-term development of
RMG-Py
. For posterity's sake, and to revive discussion, they are listed below (some information may be outdated as the last update was in 2019):Input/Output
Design a custom file format for inputting/outputting mechanisms, in replacement of reaction/thermo libraries, Chemkin files, and Cantera files. The requirements and goals of such a new format would include the following:
It should fulfill the following roles:
Note: much of this has been recently proposed, see #2634
Database organization
Kinetics is currently estimated by using rate rules, and by converting training reactions into rate rules. Ideally, we would want to be able to change how the estimation is conducted without losing the kinetics data. This would mean storing the kinetics data in a format which is semi-separate from the estimation technique, but can be used to derive the estimation parameters. Rate rules store specific groups which must exist in the current tree structure, which prevents switching to better estimation methods since the data is tied to the method. Training reactions are better, but involve labeling specific atoms based on the reaction recipe (which may also change).
Note: potentially could be addressed in #2708
Order of reaction generation and kinetics estimation
Currently in RMG, reaction rates are estimated after
processNewReactions
, which was designed to prevent generating kinetics of reactions already generated by RMG. This structure might not be ideal since generating kinetics is not time intensive and the percentage of reactions removed between generation and kinetics estimation might not be that large. If these two assumptions are true (which have not been checked), the code should estimate kinetics during or right after reaction generation.There is also an idea that reactions that are minor branching ratios could be removed after the reaction generation step. This benefit would be made significantly easier if the reaction generation occurred right after kinetics generation.
Model enlargement
Ideally, different methods to enlarge models should be modular in design. For example, a change to how a flux-based algorithm is conducted should not cause an error if the user wants to use a different generation method (like obtaining all first and second generation products). This would require a well-thought out inheritance structure for model growth.
Currently there is one class for model enlargement, one class for solver, and these are closely tied in with the main algorithm in the RMG object.
Kinetics estimation
RMG has an option for pressure dependent kinetics estimation. This has not been codified into a paper, creates undeterminable kinetics errors, and has a remarkably different flow than typical reaction generation. Prof. Green mentioned debugging and writing a paper on it might be useful. Another option to investigate if whether converting a high-pressure limit model to a pdep model after generation has similar accuracy (with less headache). This still needs to be investigated.
Chemical structure representation
Semi-implicit electrons
Electron tracking is currently an important part of representation via resonance structures recipes for reaction generation, but it also creates unnecessary difficulties. This idea would try to make RMG behave more like quantum chemistry software, in that electrons are not explicitly part of the molecular structure. The benefit of this would be reducing each chemical species to a single, unique representation, and the challenge would be to teach RMG more about how electrons work.
Potential implementation:
Parallelization
Currently RMG has a parallelization function, but it actually slows down the process. This is due most likely to transferring memory between CPUs. Getting rid of parallelization or implementing it in a way that shares memory without transmitting the memory ('global memory') are options to improve this method.
Class restructuring
This class has attributes number and mass, which are arrays and just a scalar multiple of each other. Storing these two separately might cause unexpected issues when updated one of them, but having the other be an older version.