Open davidfarinajr opened 3 years ago
I am not sure how widespread this problem is (how many training reactions are in reverse with species not in "commonly used" thermo libraries), but I think it could become more widespread as we add more training reactions for different systems (halogens, catalysis, sulfur, etc.)
I don't think is quite as big of a problem as it first appears as the bigger problem is usually that you're using rates from one chemistry to estimate rates of another chemistry.
However, the automated rate trees won't have this issue (because they "compile" their rules during training) so if this is a problem the easiest way to solve it is to simply convert it to an automated tree.
I agree, and yes converting to automated trees would fix this, but perhaps we should at least raise a warning if we use group additivity to estimate thermo for a training reaction in reverse if we are making the kinetics rules at the start of RMG job. This would have saved me time anyways in trying to figure out why my Birad_R_Recombination rates were way too fast
That might work well for Birad_R_Recombination, that has only a few reactions with mostly small species, but when you do this for everything I'm pretty sure you'll see way way more of these cases than any human wants to see and they probably won't know what to do about it.
There seems to be one argument that it's not a big problem, and another argument that there will be way more cases than any human wants to see.
Maybe we could run a script through the current database to see how common it actually is?
Sure, I did this yesterday using my halogens database branch, but all of the training reactions in master should also be on that branch. This spreadsheet has ~300 species in reverse training reactions from the default families (I didn't look at surface families).
['C3',
'thermo_DFT_CCSDTF12_BAC',
'SABIC_aromatics',
'NISTThermoLibrary',
'BurcatNS',
'JetSurF1.0',
'JetSurF2.0',
'SulfurHaynes',
'C10H11',
'DFT_QCI_thermo',
'Lai_Hexylbenzene',
'naphthalene_H',
'Fulvene_H',
'CBS_QB3_1dHR',
'CH',
'vinylCPD_H',
'Narayanaswamy',
'primaryThermoLibrary']
This is a list of thermo libraries from that spreadsheet that contain thermo for at least one training species. I added this list to the end of my thermoLibraries list in my input file, and that seems to have improved things
This list is dependent on the thermo library ordering, but it should still cover most of the training species.
This problem will go away once all of the families in that spreadsheet are autogenerated. However, in the meantime, I think it's best to add these thermo libraries to rmg input files.
oops, forgot to check for isomorphic species in that list. Number of unique species is ~230
XD What I mean is that GAV is used to reverse training reactions way more often than it will significantly impact estimation during your run because a lot of the time GAV will work well enough and normally the fact that some of a different chemistry's reactions are reversed poorly isn't going to affect estimates in other chemistries most of the time (and when it does it's usually because we really don't have good training data not because reversing it properly solves the problem).
I also don't think it's good to just assume that RMG library values are better than GAV. I'm not incredibly familiar with all of the libraries above, but I at least wouldn't trust at least C3 or CH inherently over GAV estimates.
Yes, free energy estimates at 298K with GAVs are within 2 kcal/mol of library values for most of them. So most of the time, GAV estimate is good enough. However, there are a few cases where the GAV estimates are not good, particularly for small stuff like NO
and some rings. So I don't think it's a widespread issue, but we should probably include NO
in primaryThermoLibrary so we don't use GAV for it
If training reactions are written in reverse, we use thermo to fit the kinetics in the forward direction. If the thermo is poor, the rate we train the kinetics tree with will be poor as well.
For example: Training reaction 2 in Birad_R_Recombination
NO2_p <=> NO + O
is in reverse direction. To reverse it, we need to get the thermo for NO2, NO, and O. O is in the primaryThermoLibrary, but NO and NO2 are not. If NO and NO2 are not in any of the thermo libraries specified in the input file when running RMG, then RMG uses group additivity to estimate the thermo, and since these estimates are not so good, the kinetics will be not so good. Therefore, even if we are running RMG without nitrogen in our system, we need good thermo for NO and NO2, otherwise the Birad_R_Recombination tree might be trained with poor kinetics.Possible solutions: