Closed mliu49 closed 8 years ago
Found the reason why this doesn't work:
rmgpy.rmg.react.reactMolecules
uses getDB('kinetics').families
. The retrieved database is a global database through rmgpy.data.rmg.getDB
. The returned families is an empty dictionary, which is why it will react over no families.
Basically the error msg comes from the global database not being set by the RMGDatabase
instance that was created when we initialize the RMG
object in generate_reactions
: see line 67 in rmgpy.data.rmg
. Therefore it cannot retrieve the proper database.
I'm not sure I completely understand. Why does it work if executed from the command line, but not through the website?
Ok, I understand what you mean now.
generate_reactions
initializes a second database (Database B) and loads the necessary datareactMolecules
uses getDB, which queries the global database (Database A) for dataWhen running from the command line, Database A does not exist, so the global database and the database created by generate_reactions
are the same object.
This doesn't seem to be desirable behavior, but I'm not sure what the best way to fix it would be.
As a side note, the live website is still on a commit prior to the implementation of getDB, which is why it works.
It seems that this problem needs to be solved via changes in RMG-Py, since the website has to initialize the database on startup. Possible options that I can think of include the following:
rmgpy.rmg.main.RMG.loadDatabase()
, somehow check to see if the global database exists, and if so, don't create a new instance
rmgpy.data.rmg.RMGDatabase()
, assign the global database to self if the global database already exists
Any thoughts or better suggestions would be appreciated!
Regarding option 1: is this not already happening in the constructor of rmgpy.data.rmg.RMGDatabase()
? Could we add a similar check to the rmgpy.data.rmg.RMGDatabase.loadDatabase()
method?
# Module-level variable to store the (only) instance of RMGDatabase in use.
database = None
class RMGDatabase:
"""
The primary class for working with the RMG database.
"""
def __init__(self):
...
# Store the newly created database in the module.
global database
# assert database is None, "Should only make one instance of RMGDatabase because it's stored as a module-level variable."
if database is None:
database = self
else:
logging.warning("Should only make one instance of RMGDatabase because it's stored as a module-level variable!")
logging.warning("Unexpected behaviour may result!")
It is being checked in rmgpy.data.rmg.RMGDatabase()
. but at that point, a new RMGDatabase() object has already been created, which leads to self and global database becoming two different objects. The intention of option 1 would be to perform the check earlier, before rmgpy.rmg.main.RMG.loadDatabase()
does self.database = RMGDatabase()
. However, the more I think about it, the less viable that idea seems.
I've not been following the context very closely, but how about something like
In [1]: import rmgpy.data.rmg
In [2]: the_one_and_only = rmgpy.data.rmg.database or rmgpy.data.rmg.RMGDatabase()
In [3]: id(the_one_and_only)
Out[3]: 4418936200
In [4]: the_one_and_only = rmgpy.data.rmg.database or rmgpy.data.rmg.RMGDatabase()
In [5]: id(the_one_and_only)
Out[5]: 4418936200
?
Option 1: The problem with this is that the two databases loaded will actually be different. The populate reactions input file will have the user specified thermo and kinetics libraries, whereas the website loaded one generally has ALL of the libraries I think? Then again I'm not sure what's going on because none of the families are loaded in that one. But there will definitely be a problem with the different database contents.
For the reason of different loading affecting the generated outcome.. none of the options seem super viable to me. Unless there is some way to not use all the contents (families/libraries) loaded within an RMGDatabase
object but instead selectively obtain them.
The website preloads the database in order to populates lists for thermo libraries, kinetics libraries, and solvents, for the input file generator and solvation search. Therefore, it doesn't load the kinetics families.
Nick also suggested the option of loading everything at the beginning, but then the generate_reactions script would need to be reworked to account for user selection of families and libraries. Currently, it relies heavily on methods used for typical RMG runs, so it expects the database to be loaded according to the input file.
With respect to the databases being different, after the rmgpy.data.rmg.RMGDatabase()
object is created, rmgpy.data.rmg.RMGDatabase.load()
is called to actually load the databases, which I think basically resets what's been loaded. The issue is that the load method loads everything to the self object, while the getDB methods gets data from the global database.
Perhaps changing the load method to load to the global database might be an option? However, if that were done, whenever an RMGDatabase() object is instantiated, a new object would be created but never used because loading and getting data would all occur with the global object.
Stepping back though, the idea that the website reloads the global database object every time a user submits a populate reactions task seems to be undesirable. Perhaps it would actually be best to redesign the website to load everything once, then have populate reactions only use the user selected libraries/families. However, then it would diverge from the generate reactions script in RMG-Py.
Sorry for so much text. It doesn't seem like there's going to be a simple solution.
After discussing with Connie, we think the easiest solution is to actually revert to using a subprocess call. It doesn't seem worth it to modify multiple parts of the core RMG code to fix one website tool.
The website was not updated following ReactionMechanismGenerator/RMG-Py@bfa7516fc8b80691d414c76e18099bece4d1cde1, so the PopulateReactions tool does not work at all, as can be seen on the dev site.
However, even after making the appropriate change (5ca3188632e51125e2713d9829b5e7e152dfda61), the tool still does not seem to work properly when tested locally. Specifically, it will not find a single reaction. Testing the RMG-Py script directly from the command line gives the expected result, so the problem appears to be on the website side. The live website is currently working, but when I checked out the same commits locally, it also gave zero reactions.
One difference I noticed is that the website will give the following warning
while running from the command line will not, which I assume is because the website preloads the database? However, I don't know why that would result in not finding any reactions, and I'm also extremely confused as to how the live website is working.
I'm also working on writing unit tests for this.