ReactionMechanismGenerator / RMG-Java

The Java version of RMG: Reaction Mechanism Generator
http://rmg.sourceforge.net/
MIT License
29 stars 36 forks source link

Polycyclics thoughts: on-the-fly vs group additivity #273

Open nickvandewiele opened 11 years ago

nickvandewiele commented 11 years ago

Just a post to share my experiences with thermochemistry for polycyclics.

Greg implemented the semi-empirical PM3 , molecular mechanics MM4 features. I extended Benson for polycyclics by creating a new polycyclic ring correction library. Both were fueled by the sense that something was very wrong with the thermo for polycyclic molecules.

Both have weaknesses and strengths:

Could not find a non trivial ring correction!Trying QMTP...
*****Warning: zero freqencies found (values lower than 7.7 cm^-1 are rounded to zero in MM4 output); CanTherm should hopefully correct this:
Pre-existing successful MM4 result for REXBDNBUYGXXMZ-UHFFFAOYAU (InChI=1/C8H12/c1-2-5-8-6-3-4-7-8/h2H,3-4,6-7H2,1H3) has been found. This log file will be used.
Thermo for REXBDNBUYGXXMZ-UHFFFAOYAU: 323.25    129.02  36.22   46.55   55.81   63.64   75.79   84.59   97.81   
Thermo for REXBDNBUYGXXMZ-UHFFFAOYAU_RRHO: 323.25   129.02  36.22   46.55   55.81   63.64   75.79   84.59   97.81   
Created new species: SPC(4304) InChI=1/C8H12/c1-2-5-8-6-3-4-7-8/h2H,3-4,6-7H2,1H3
ChemFormula: C8H12
1  H 0 {7,S}
2  C 0 {3,S} {9,S} {10,S} {11,S}
3  C 0 {2,S} {4,S} {12,S} {13,S}
4  C 0 {5,D} {3,S} {8,S}
5  C 0 {4,D} {6,D}
6  C 0 {5,D} {7,S} {14,S}
7  C 0 {6,S} {1,S} {15,S} {16,S}
8  C 0 {4,S} {9,S} {17,S} {18,S}
9  C 0 {2,S} {8,S} {19,S} {20,S}
10  H 0 {2,S}
11  H 0 {2,S}
12  H 0 {3,S}
13  H 0 {3,S}
14  H 0 {6,S}
15  H 0 {7,S}
16  H 0 {7,S}
17  H 0 {8,S}
18  H 0 {8,S}
19  H 0 {9,S}
20  H 0 {9,S}

The number of resonance isomers is 0
The NASA data is 
!Group:Cs-CsCsHH    Group:Cs-(Cds-Cdd-Cd)CsHH   Group:Cds-(Cdd-Cd)CsCs  Group:Cdd-CdsCds    Group:Cds-(Cdd-Cd)CsH   Group:Cs-(Cds-Cdd-Cd)HHH    Group:Cs-(Cds-Cdd-Cd)CsHH   Group:Cs-CsCsHH 
!MM4 calculation with CanTherm analysis
! [_ SMILES="InChI=1/C8H12/c1-2-5-8-6-3-4-7-8/h2H,3-4,6-7H2,1H3" _]
SPC(168)                C   8H  12          G   250.000  5000.000   995.043    1
 1.71088574E+01 3.64686093E-02-1.30374883E-05 2.10165108E-09-1.25939495E-13    2
 1.54107129E+05-4.74289947E+01 3.22989036E+00 3.56896401E-02 7.34166570E-05    3
-1.12958069E-07 4.31375788E-11 1.59669759E+05 3.35327699E+01                   4

ThermoData is 
323.25  129.02  36.22   46.55   55.81   63.64   75.79   84.59   97.81   

For some reason, unknown to me, the enthalpy by MM4 ends up being 323 kcal/mol while 19 kcal/ mol is a reasonable estimate.

In conclusion: I am thinking of creating a new thermochemistry for polycyclics strategy where enthalpies are preferrably based on polcycylic library corrections while S, and Cps are calculated on-the-fly, and thus benefitting from the strengths of both approaches.

this should definitely improve the accuracy but will lead to longer simulation times, compared to the current strategy, where on-the-fly calcs are only done on species that did not have library entries.

Let me know what you think about this.

keceli commented 11 years ago

Here is a sample Mopac file that can give two different outputs based on the machine it was run.

pm3  precise nosym gnorm=0.0 bfgs
InChI=1/C8H10/c1-3-8-6-4-5-7(8)2/h3-4,6H,1,5H2,2H3

C   1.65870 1 -0.67670 1  0.19900 1
C   0.42230 1  0.13240 1  0.20490 1
C   0.33840 1  1.58310 1  0.31480 1
C  -0.93230 1  1.94420 1  0.28390 1
C  -1.82610 1  0.75820 1  0.14920 1
C  -0.81740 1 -0.34660 1  0.10950 1
C  -1.19930 1 -1.78890 1 -0.02140 1
C   2.88300 1 -0.16430 1  0.29630 1
H  -0.84030 1 -2.36120 1  0.86350 1
H   3.74640 1 -0.82220 1  0.28490 1
H   1.57790 1 -1.75620 1  0.11060 1
H   1.16210 1  2.28270 1  0.40800 1
H  -2.30490 1 -1.90590 1 -0.08210 1
H  -1.27360 1  2.97230 1  0.34860 1
H  -0.75590 1 -2.21910 1 -0.94760 1
H  -2.49530 1  0.66210 1  1.03420 1
H   3.06650 1  0.90050 1  0.38840 1
H  -2.41000 1  0.80580 1 -0.79800 1

oldgeo thermo nosym precise pm3

RMG generates one of these thermo values based on Mopac output (where the difference originated):

Thermo for FBVAULMWQCXHBR-UHFFFAOYAF: 37.32 96.33 32.48 42.08 50.48 57.45 68.1 75.77 87.42

Thermo for FBVAULMWQCXHBR-UHFFFAOYAF: 37.11 90.41 32.48 42.11 50.52 57.5 68.16 75.83 87.47

There is no fault on RMG side for this problem. As Nick mentioned, this is related to the number of cycles in Mopac optimization step. In machines with different floating point handlings (e.g. 8 core nodes vs 48 core nodes on pharos) you can observe that one can achieve convergence earlier than other.

Possible solutions are either recompiling Mopac (or maybe updating to a newer version) with flags that ensure numerics are independent of the machine (Ray's suggestion) or setting a tighter convergence (change first line as: pm3 let precise nosym gnorm=0.0001 SCFCRT=1.D-10) to minimize the difference.

If we can find an open source (with GPL or similar license) alternative to Mopac for PM3 calculations, that might be the best solution, since it gives the option to pack it with RMG.

nickvandewiele commented 11 years ago

the interesting thing is also that significant differences in this example are limited to the entropy, and not to the heat capacity. any idea if this is the case for other examples too?

jwallen commented 11 years ago

I'm trying to merge the polycyclics work into master for the 4.0 release, but when I test it I found a problem: monocyclics are not having any ring corrections applied (during a regular RMG job without QMTP). The log file is full of Could not find a non trivial ring correction! messages for the cyclic species. This cyclic species is one example; its enthalpy is off by ~6 kcal/mol and entropy by ~30 cal/mol*K from the value given on the master branch. Any suggestions, @nickvandewiele?

nickvandewiele commented 11 years ago

I assume that the strategy you used to deal with cyclics is the one in which QMTP is explicitly not used. In case of a monocyclic species, RMG searches through Ring_Library/Dictionary/Tree.txt to find an apt ring correction. If there was not a single ring correction that fitted the description of the SSSR atoms, then it warns the user that thermo for that species did not contain any ring correction.

Previously, RMG would silently use the most generic ring correction, e.g. five-membered-ring (or which one did it use in this case). Currently, it does not add any correction, but instead warns the user about this. Next, the user would need to supply a more accurate thermo for that particular ring correction/species, or use the strategy that automatically switches to QMTP to provide thermo estimates.

So, the printed message in the log warns the user that he/she should revise the ring correction libraries. That was the behaviour i aimed at.

In conclusion, the discrepancies of 6 kcal/mol are presumably the correction for a generic_five_member_ring that is pointing to something like cyclopentane.

Does that make sense to you?

jwallen commented 11 years ago

That makes sense. However, I think we should still use the generic ring corrections as before, even if they aren't a perfect match, while keeping the error message. (Maybe only print the error message for polycyclics, not monocyclics?) Otherwise people would need to run QMTP all the time, which I think is overkill for the model I am trying to build. The only significant cyclics in my model are monocyclic ethers formed from the low-temperature peroxy chemistry, and the generic ring corrections were good enough to get the major cyclic ethers correct without QMTP. Basically, I think that some ring corrections + warning is better than no ring corrections + warning.

At some point we may want to gather warnings such as these and print them at the end of the log file so they are easier to find (similar to the way LaTeX does), but that's for another issue.

nickvandewiele commented 11 years ago

ok, fair enough. I understand your point. I think it's pretty easy to change to "corrections + warning". I'll look into where exactly this needs to be changed.

Of course, this strategy was based on my experiences with my systems of interest where crazy species such as C1=C=CCC1 kept on accumulating because they were as stable as "regular" cyclopentanes.

connie commented 11 years ago

We were able to get the old functionality by removing lines 356-357 in ThermoGAGroupLibrary.java and replacing with an error warning message instead of returning null: https://github.com/connie/RMG-Java/blob/master/source/RMG/jing/chem/ThermoGAGroupLibrary.java#L356

This will probably screw up the hybrid and qmtp schemes completely though. So we need to figure out how to maintain the functionality for hybrid and qmtp but still have generic ring strain corrections incorporated when it's BensonOnly.

nickvandewiele commented 11 years ago

Without having tested anything:

why don't you change the value 1 to 0? line 373:

if(deepest <= 1)

what this should do is take the L1 hit (e.g. L1: FiveMember) in Ring_Tree.txt as a value for the ThermoGAValue. Only if the ring hits L0 ("Ring"), then it will return null.

This seems to be less invasive than your idea.

jwallen commented 11 years ago

I think that would have the same effect as my idea (fix the BensonTDGenerator but break the HybridTDGenerator) because I don't think we ever stop at the L0 node since there are L1 nodes for three-membered to ten-membered rings.

The challenge here seems to be that ThermoGAGroupLibrary.findRingCorrections() must return null in order to trigger the hybrid method, but must return non-null in order for the ring corrections to be applied in the Benson-only method. What we need is for this method to return both (1) the best available ring corrections for the Benson method and (2) a flag indicating the quality of the corrections so we can use the hybrid method. Naturally, Java doesn't allow for easily returning two things from a method. I'm toying with creating a simple new class (something like RingCorrections) that stores both pieces of information, but am more than happy to defer to anyone with a better idea.