ReactionMechanismGenerator / RMG-Py

Python version of the amazing Reaction Mechanism Generator (RMG).
http://reactionmechanismgenerator.github.io/RMG-Py/
Other
381 stars 226 forks source link

On the symmetry contribution of radicals #375

Closed nickvandewiele closed 9 years ago

nickvandewiele commented 9 years ago

This issue discusses the way the symmetry contribution to the thermochemistry of radicals is calculated, in the particular case when the thermochemistry of the radical is based on the thermochemistry of the saturated parent molecule.

Background

So, in general, we need to add a symmetry contribution to the entropy S of a species:

S_symm = -R * ln(sigma)

If we estimate the entropy of the species via a method that does not take this contribution into account. For example, if the thermochemistry of the species is calculated via group additivity, we need to add this symmetry contribution, yielding the total entropy:

S_total = S_symm + S_ga

Thermochemistry for radicals

The story becomes a bit more complex for radicals.

In RMG, we do not have group additive ways to calculate the thermochemistry of radical species as we do for closed-shell species. Instead, we have two ways of determining the thermochemistry of radical species:

  1. Thermolibraries
  2. Hydrogen Bond Increment method

In the case of thermolibraries, we simply use the thermochemistry of the species stored in the library. In thecase of the Hydrogen Bond Increment method we base the thermo of the radical species R* off of the thermochemistry of its saturated parent molecule RH and a Hydrogen Bond Increment:

X_R* = X_RH + X_HBI

Where X is a thermochemical variable like H, S, or Cp.

Symmetry for radicals

Now, an important question arises

When do we add a symmetry contribution to the entropy of a radical species?

Clearly, the following principle is trivial: Do not add a symmetry contribution when the thermochemistry of the radical species originates from a thermo library.

But what happens when thermochemistry of the radical species originates from the HBI method? Now, the highly confusing part begins where we start thinking about the symmetry contribution for radical species. And this is where the current master (3fe7cb13207f6a5965d66f2b17d93327177e2f7c) gets it wrong.

An important question to ask here is: Does the HBI incorporate the differences in symmetry between the radical and the parent molecule?

The answer is: No, at least not in the way it was designed by Bozzelli et al. From their paper (http://pubs.acs.org/doi/pdf/10.1021/j100039a045):

Entropy corrections accounting for changes in symmetry between the parent molecule and radical are not included in the HBI group values. These corrections need to be separately considered for each radical and parent molecule.

OK, that means that the thermo of the parent molecule should not contain anything related to the symmetry of the molecule!

Symmetry for saturated parent molecules

The thermochemistry of the saturated parent molecule RH can originate from 3 different sources:

In the case of group additivity, this piece of code is relevant (check the nice documentation in the method name, RMG-JAVA style!):

rmgpy.data.thermo.ThermoDatabase:
def estimateThermoViaGroupAdditivity(self, molecule):
 ...
         if molecule.isRadical(): # radical species
            return self.estimateRadicalThermoViaHBI(molecule,  self.estimateThermoViaGroupAdditivityForSaturatedStructWithoutSymmetryCorrection)

        else: # non-radical species
            thermoData =  self.estimateThermoViaGroupAdditivityForSaturatedStructWithoutSymmetryCorrection(molecule)

        molecule.calculateSymmetryNumber()
        thermoData.S298.value_si -= constants.R * math.log(molecule.symmetryNumber)

In other words, when the thermo of the parent molecule is calculated via group additivity, everything is taken care of. Nice!

Let's look at the other sources of thermochemistry:

rmgpy.rmg.model.Species:
def generateThermoData(...):
...
tdata = database.thermo.estimateRadicalThermoViaHBI(molecule, database.thermo.getThermoDataFromLibraries)
tdata = database.thermo.estimateRadicalThermoViaHBI(molecule, quantumMechanics.getThermoData)

So, for thermo libraries and QMTP, we pass in the method as a parameter. Next, this happens:

rmgpy.rmg.model.Species:
def generateThermoData(...):
# Get thermo estimate for saturated form of structure
        try:
            thermoData = stableThermoEstimator(saturatedStruct)
       ...
        molecule.calculateSymmetryNumber()
        thermoData.S298.value_si -= constants.R * math.log(molecule.symmetryNumber)
...

So, when the thermochemistry of the parent molecule originates from a thermolibrary of from QMTP, it is used as such, without removing the symmetry contribution!

Identifying the complexities of the problem

The problem of the additional, unwanted symmetry contribution in the thermochemistry of saturated molecules is caused by using the same method for two distinct applications. E.g.

database.thermo.getThermoDataFromLibraries is used for:

In the former case, we want to the symmetry contribution to be present in the entropy. In the latter case, we don't want the symmetry contribution.

Sometimes the symmetry contribution is not explicitly available as for thermo libraries and we'd need to remove the symmetry number contribution from the entropy of the saturated parent molecule.

Sometimes, the symmetry contribution is calculated expliclity, e.g. for QMTP where we send the atomic coordinates to SYMMETRY.

By the way, we store the symmetry number of a Molecule:

rmgpy.molecule.Molecule:
    def calculateSymmetryNumber(self):
        self.symmetryNumber = calculateSymmetryNumber(self)
        return self.symmetryNumber

Solutions

I'd take this opportunity to suggest solutions, not in the form of code changes, but rather in the form of principles. The idea of a principle is that it always applies.

Principle 1 Do not add a symmetry contribution when the thermochemistry of the radical species originates from a thermo library.

Principle 2 Add a symmetry contribution to the thermochemistry of the radical species when they originate from the HBI method.

Principle 3 Remove the symmetry contribution of a species when it is used as a saturated parent molecule in the HBI method for the estimation of the thermochemistry of radical species.

nickvandewiele commented 9 years ago

See #376 for a fix.

Principle 1, and Principle 2 have been adhered to.

Principle 3 is currently not followed, because the symmetry contribution was not added in the first place in the group additivity method for thermo. To be consistent, we would have to add the symmetry contribution in the GA method, and then remove it again.

rwest commented 9 years ago

I think we used to do precisely what your last point suggests: add the symmetry contribution in the GA method, then subtract it and add the correct one in the HBI method. This was consistent and clear, but @KEHANG found through profiling that it was spending a lot of time calculating symmetry numbers, and in an effort to reduce the number of calls from 3 to 1, reorganized things in https://github.com/GreenGroup/RMG-Py/pull/290, which is why GA, QM, and libraries, are now inconsistent. See the discussion in https://github.com/GreenGroup/RMG-Py/pull/290 which highlights this problem, and https://github.com/GreenGroup/RMG-Py/issues/317 which was meant to remind us (@KEHANG?) to fix it.

If you can finally straighten things out, that would be great!

nickvandewiele commented 9 years ago

I didn't realize that we (including me) already bumped into this 6 months ago. I now recall my own quote:

Nevertheless, I find this a very confusing piece of code that is bound to result in bugs in the future.

This is exactly what happened in 3fe7cb1.

So the way I see it:

How do we avoid redundant computation of symmetry numbers?

We do that by always using the molecule.getSymmetryNumber() method. This method checks if the symmetry has already been calculated. If not, it calls molecule.calcSymmetryNumber().

Is the symmetry contribution added in the group additivity method or not?

The answer can be found in this piece of code:

    def estimateThermoViaGroupAdditivity(self, molecule):
        """
        Return the set of thermodynamic parameters corresponding to a given
        :class:`Molecule` object `molecule` by estimation using the group
        additivity values. If no group additivity values are loaded, a
        :class:`DatabaseError` is raised.
        """
...

        if molecule.isRadical(): # radical species
            thermoData = self.estimateRadicalThermoViaHBI(molecule, self.computeGroupAdditivityThermo)
            return thermoData

        else: # non-radical species
            thermoData = self.computeGroupAdditivityThermo(molecule)
            # Correct entropy for symmetry number
            if not 'saturated' in molecule.props: 
                thermoData.S298.value_si -= constants.R * math.log(molecule.getSymmetryNumber())
            return thermoData

The answer is two-fold:

The method self.computeGroupAdditivityThermo, previously named estimateThermoViaGroupAdditivityForSaturatedStructWithoutSymmetryCorrection simply searches for the atom-centered groups in a molecule.

Conclusions

To me, the real confusion is a semantic one. Although we apply the method estimateThermoViaGroupAdditivity on radicals too, we do not implement a group additive method for radicals. We have a HBI method, for radicals though, which is not the same. The thermo of the parent molecule of a radical can come from a thermo library, even though it is used by a method estimateThermoViaGroupAdditivity. That's bound to mislead people.

rwest commented 9 years ago

Really? estimateThermoViaGroupAdditivity on some radical could do HBI on top of a value from a library? This is even more of a mess than I thought.

nickvandewiele commented 9 years ago

I've judged too quickly. It's not true that estimateThermoViaGroupAdditivity for radicals can use thermo libraries for its parent molecules. It will only use the GA method, because computeGroupAdditivityThermo is called.

rwest commented 9 years ago

@connie recently did some work on this (improving the logic flow for HBI and where it gets the underlying data from) to make it much more sensible and rational (our sketch on the whiteboard of the previous algorithm was quite comical). I'm not sure if she tackled the way we do symmetry corrections at the same time...?

connie commented 9 years ago

@nickvandewiele had brought up this issue with me. I had changed the algorithm so that RMG prioritizes HBI on library values prior to group additivity when QM is off in rmgpy/rmg/model.py. That's how we uncovered the symmetry number error. When applying HBI to either library values or QM generated saturated values, the symmetry error exists. So this is a bug, one that existed for even longer with the QM radical values. I think that it would be good to actually migrate my changes and the prioritization scheme over to rmgpy/data/thermo.py rather than rmgpy/rmg/model.py.

rwest commented 9 years ago

Yes, this is a bug, and I think it was introduced (knowingly) in #290.

I think @nickvandewiele's symmetry number cache (in the 'How do we avoid redundant computation of symmetry numbers?' section above) won't work to speed things up if the saturated parent molecule is made fresh for each radical (which it is). Unless we cache the whole thing as suggested in #377.

connie commented 9 years ago

I believe that this is finally resolved via https://github.com/GreenGroup/RMG-Py/commit/7759b113b238dff9d61823787526fcc34aff8d3a