Issues Estimating correct energies for simple reactions, huge list of database references

DetlevCM commented 8 years ago

This is a bizarre error as I wasn't able to obtain a minimum working example when it was first discovered for toluene, however I noticed that it exists for Octane too which makes it so bizarre.

So, to describe the problem: A mechanism is generated under liquid phase conditions using a large number of seed species (all at zero concentration except the initial reactants) and RMG is started. This is at moderate temperatures - 400-500K.

As the code runs (weirdly only in a real job, not a minimum example) it starts to write very long references in the annotated chemkin file for the estimated kinetics. For an aromatic species I would have guessed the ring causes issues, but I am surprised by the same being observed with Octane which has prompted me to post this here.

The reaction described is R. + ROOH = RH + ROO for Octane. The mechanism is fine for the carbon "in the chain" (first 3 reactions in attached file) and it then fails for the end group carbon.

I've attached a cutout from the chem_annotated.inp file issue.txt

The version of RMG that caused this issue was:

The current git HEAD for RMG-Py is: 94e998c2a8273fc8f692cc2dbdf7f0b40e21b6ba Fri Mar 25 13:42:41 2016 +0100

The current git HEAD for RMG-database is: 724c086d26e5e50ee5221a126639cc07b3c72f2d Sun Feb 28 23:57:07 2016 -0500

connie commented 8 years ago

This happens if you have

options(
     verboseComments = True
)

in the input.py file. Is this the case?

connie commented 8 years ago

As for why the code actually fails... can you provide the traceback?

DetlevCM commented 8 years ago

Yes @connie, I do indeed have verboseComments = True in my input file.

How would I get a traceback? It does run - but annoyingly produces this huge (suspect?) output.

connie commented 8 years ago

The verboseComments flag generates the verbose ouputs in the chemkin file..

DetlevCM commented 8 years ago

@connie The verbose output is nice an well, though should a reaction as simple as R. + ROOH = RH + ROO. for Octane not have a comparatively short entry in the Chemkin file? I cannot see how it takes an average of hundreds if not thousands of database values to estimate such as reaction.

The first three reactions in the attached text file (original post) are normal in that they are short. Other entries may take a couple of lines, but these huge blocks don't seem right.

bslakman commented 8 years ago

For some reason (as opposed to the first few reactions in the snippet you gave) the kinetics are being estimated in the reverse direction, leading to a very general node being hit (that's why it's printing out all the children nodes):

! Estimated using template (X_H;Y_rad) for rate rule (Orad_O_H;C_rad/H2/Cs)
! Kinetics were estimated in this direction instead of the reverse because:
! Both directions are estimates, but this direction is exergonic.
! dHrxn(298 K) = -197.57 kJ/mol, dGrxn(298 K) = -191.94 kJ/mol
HO2(61)+C8H17(63)=oxygen(2)+Octane(1)               1.606e+04 2.560     9.268

The comments say it's done this way because "both directions are estimates". Searching on the RMG website octane + O2 produces 4 reactions for the 4 different reacting sites, and a couple of them do match an exact rate rule, but some are being estimated using a template... however in this octane + O2 direction, the template being used is more specific than (X_H;Y_rad). So I don't think anything is actually wrong, except that maybe estimating in the exergonic direction isn't necessarily best. It seems to me that if both directions are estimates, we should be using the more specific template.

DetlevCM commented 8 years ago

Thanks for chiming in @bslakman

So you suggest the huge "blob" is actually wanted? - Unless I am mistaken, the, let me say text, specifies which families in the RMG database are picked to estimate parameters for this reaction. If yes, it should then just follow the tree in the RMG database along a line of "best fit" and average where more than one entry fits, no? This may produce a large entry for a badly matches species but no such a huge entry.

Then looking at the list in detail, I find entries such as this O_pri;Ct_rad/N + C_methane;Ct_rad/N + NH3;Ct_rad/N. What is that doing there? There is no nitronge in my liquid phase system - not at all. Finally, when I firts encountered these I tried to reproduce them with a minimum working example and was unable to do so, but with a mechanism seed they turn up in my mechanisms?

bslakman commented 8 years ago

It's wanted in the sense that it's not a bug, but I don't think we actually want it. For this particular reaction, we follow the tree, and find that we have a rate rule describing its reacting functional groups with (Orad_O_H;C_rad/H2/Cs). However, we must not have data for this rate rule, meaning that we go to more general nodes which also don't have data, until finally we reach (X_H;Y_rad) and average its children. That's why we have those entries containing nitrogen (since for example Ct_rad/N is a child of Y_rad which is just anything with one radical electron). In this specific example it's really not a great match but that is how it's designed to work. It's my opinion that we should be using the kinetics estimate from the reverse direction, where we have data for a more specific node, but because it's still an "estimate" and not an exact match, RMG is using the kinetics in the exergonic direction according to the comments. As to your last point, can you explain more what you mean by your minimum working example and the one with the seed?

DetlevCM commented 8 years ago

Thanks for the explanation. I'm surprised it would end up so long...

Coming back to the point about a seed:

I have an input file with many zero concentration species that are expected to be formed (the octane one is from a colleague, the toluene one is my own). (As per https://reactionmechanismgenerator.github.io/RMG-Py/users/rmg/guidelines.html#start-with-a-good-seed-mechanism really)

When I first saw these I then tried to produce a minimum example - which meant similar input conditions, but no seed and a smaller species for the reactant, gradually stepping up. The relevant reaction gets found quickly, but I never saw such a huge "blob" for the kinetics estimation.

bslakman commented 8 years ago

Can you paste the chemkin output for the same reaction in your minimum example which doesn't have the blob?

DetlevCM commented 8 years ago

I'll need to re-run it. I will have a look once I am back in the office tomorrow. I don't think I have the original files and as a plus, it will be run on the latest RMG-Py release too.

rwest commented 8 years ago

The averaging algorithm is explained at the end of section 2.3 in the recent RMG paper http://dx.doi.org/10.1016/j.cpc.2016.02.013 It does lead to remarkably large "average of (average of (average of ....)))" scenarios once you fall more than one or two levels up the tree, especially because the tree is the square of the size you expect (there's a tree for X_H and a tree for Y_rad, and all possible combinations are considered and averaged). We tried a group additive approach to estimate kinetics that gave much cleaner comments and an easier to understand and explain source, but @jwallen's PhD Thesis showed that it often gave worse estimates, so it is not the default (and in fact may no longer work; it used to be that you could specify kineticsEstimator = 'group additivity'.)

Belinda's suggestion about using the probably-better direction instead of the exergonic direction is worth investigating, but in my experience our intuition about which method will be most accurate is often wrong and I'd suggest someone collects empirical evidence as to which gives the most robust estimates before we change it. I seem to recall estimating endergonic directions poorly and finding the reverse from Kc has previously led to estimates being much too fast, causing problems in the pressure-dependence code.

The long comments can be helpful to see which estimates are quite likely to be poor approximations. But yes, the verbose comments can be cumbersomely long and not always helpful, which is why the verboseComments = False option was added.

DetlevCM commented 8 years ago

@rwest Thank you for the very detailed explanation - now I understand where the "blob" comes from. Also thank you for the references.

@bslakman On the minimum working example it seems I have hit an RMG problem with the newest version... at least it no longer runs for me failing with the following: (Did I manage to grab a build that doesn't work?)

Traceback (most recent call last):
  File "/home/irsrvshare2/R10/L0277/users/mielczad/VPYTH_uptd/RMG-Py-Mod/rmg.py", line 165, in <module>
    rmg.execute(inputFile, output_dir, **kwargs)
  File "/home/irsrvshare2/R10/L0277/users/mielczad/VPYTH_uptd/RMG-Py-Mod/rmgpy/rmg/main.py", line 552, in execute
    self.saveEverything()
  File "/home/irsrvshare2/R10/L0277/users/mielczad/VPYTH_uptd/RMG-Py-Mod/rmgpy/rmg/main.py", line 796, in saveEverything
    self.notify()
  File "/home/irsrvshare2/R10/L0277/users/mielczad/VPYTH_uptd/RMG-Py-Mod/rmgpy/util.py", line 106, in notify
    observer.update(self)
  File "/home/irsrvshare2/R10/L0277/users/mielczad/VPYTH_uptd/RMG-Py-Mod/rmgpy/restart.py", line 93, in update
    save(rmg)
  File "/home/irsrvshare2/R10/L0277/users/mielczad/VPYTH_uptd/RMG-Py-Mod/rmgpy/restart.py", line 42, in save
    delay=0 if rmg.done else rmg.saveRestartPeriod.value_si
  File "/home/irsrvshare2/R10/L0277/users/mielczad/VPYTH_uptd/RMG-Py-Mod/rmgpy/restart.py", line 65, in saveRestartFile
    cPickle.dump(reactionModel, f, cPickle.HIGHEST_PROTOCOL)
  File "/home/irsrvshare2/R10/L0277/users/mielczad/VPYTH_uptd/RMG-Py-Mod/rmgpy/data/kinetics/family.py", line 126, in __reduce__
    self.reverse,
AttributeError: 'TemplateReaction' object has no attribute 'reverse'

nickvandewiele commented 8 years ago

We are currently having trouble with the restart feature of RMG.

Ensure that the restart option in your RMG-Py input file is set to None:

options(
saveRestartPeriod=None
...)

This has also been reported here.

DetlevCM commented 8 years ago

@nickvandewiele OK, thanks. I had the restart period set to 24h ... let's see what happens.

DetlevCM commented 8 years ago

Good news for everybody I think - at least on this issues. @bslakman my minimum working example gave me the "huge blob" now - so I guess that is how it works, thanks. (Not sure what I did a few months ago when it didn't...) Thanks to @rwest for elaborating further how the large entry forms and thanks to @nickvandewiele for pointing out the issue with the restart period.

ReactionMechanismGenerator / RMG-Py

Issues Estimating correct energies for simple reactions, huge list of database references #658