ReactionMechanismGenerator / RMG-Py

Python version of the amazing Reaction Mechanism Generator (RMG).
http://reactionmechanismgenerator.github.io/RMG-Py/
Other
394 stars 227 forks source link

singlet carbene molecules have wrong identifiers (SMILES, InChI) #372

Closed nickvandewiele closed 9 years ago

nickvandewiele commented 9 years ago

Since the recent molecule representation update which transformed the way we represent singlet carbene molecules, a conversion problem has arisen.

    ch2_s = '''
    multiplicity 1
    1 C u0 p1 c0 {2,S} {3,S}
    2 H u0 p0 c0 {1,S}
    3 H u0 p0 c0 {1,S}
    '''

    ch2_t = '''
    multiplicity 3
    1 C u2 p0 c0 {2,S} {3,S}
    2 H u0 p0 c0 {1,S}
    3 H u0 p0 c0 {1,S}
    '''

We execute the following script:

    mol = Molecule().fromAdjacencyList(adj_list)
    spc = Species(molecule=[mol])

    print 'adj list:\n ', spc.toAdjacencyList()
    print 'aug inchi: ', spc.getAugmentedInChI()
    print 'smiles: ', spc.molecule[0].toSMILES()

Converting the triplet carbene into identifiers goes OK:

adj list:
  [CH2]
multiplicity 3
1 C u2 p0 c0 {2,S} {3,S}
2 H u0 p0 c0 {1,S}
3 H u0 p0 c0 {1,S}

aug inchi:  InChI=1S/CH2/h1H2/mult3
smiles:  [CH2]

RDKit is clever enough to handle multiple unpaired electrons.

However, according to the new scheme, the singlet carbene does not have unpaired electrons anymore. We store them in a lone pair now instead. You get this now:

adj list:
  C
1 C u0 p1 c0 {2,S} {3,S}
2 H u0 p0 c0 {1,S}
3 H u0 p0 c0 {1,S}

aug inchi:  InChI=1S/CH4/h1H4
smiles:  C

Not only does the RDKit conversion go wrong, resulting in too many hydrogens for carbene (4 instead of 2), nowhere do we store the multiplicity of carbene in the augmented inchi. The latter would currently only sum the number of radical electrons. In case of singlet carbene this returns 0, and as such is not appended to the inchi string.

SMILES, and InChI identifiers are currently only used in few occasions in RMG:

Practically speaking,

The current repercussions are limited. Labels are not used to compare species. Thermo for carbene will never originate from QM.

However, any new design features that would utilize string identifiers like SMILES or InChI would suffer from our decision of using lone pairs for singlet carbenes.

rwest commented 9 years ago

Safe conversion to RDKit molecules for SMILES and image generation is relied upon by many other tools (parts of the website, the importer script, output log generation, etc.) that even if the kinetic model from an RMG run is not in error, this is still a real error. Is there any way to represent carbene safely in RDKit, and we just need to fix up our Molecule.toRDKitMol() method? At the moment it's basically just

        for index, atom in enumerate(self.vertices):
            rdAtom = Chem.rdchem.Atom(atom.element.symbol)
            rdAtom.SetNumRadicalElectrons(atom.radicalElectrons)
            rdkitmol.AddAtom(rdAtom)
            rdAtomIndices[atom] = index

and makes no effort to tell RDKit about the lone pair.

nickvandewiele commented 9 years ago

At first sight, I don't see how to copy the lonePairs attribute from an RMG atom to an RDKit molecule, which explains why this wasn't done in the first place in Molecule.toRDKitMol().

nickvandewiele commented 9 years ago

An overview of the RDKit Atom methods:

rdAtom.ClearProp                 rdAtom.GetDoubleProp             rdAtom.GetIsAromatic             rdAtom.GetNumImplicitHs          rdAtom.GetSymbol                 rdAtom.IsInRing                  rdAtom.SetDoubleProp             rdAtom.SetMonomerInfo
rdAtom.DescribeQuery             rdAtom.GetExplicitValence        rdAtom.GetIsotope                rdAtom.GetNumRadicalElectrons    rdAtom.GetTotalDegree            rdAtom.IsInRingSize              rdAtom.SetFormalCharge           rdAtom.SetNoImplicit
rdAtom.GetAtomicNum              rdAtom.GetFormalCharge           rdAtom.GetMass                   rdAtom.GetOwningMol              rdAtom.GetTotalNumHs             rdAtom.Match                     rdAtom.SetHybridization          rdAtom.SetNumExplicitHs
rdAtom.GetBonds                  rdAtom.GetHybridization          rdAtom.GetMonomerInfo            rdAtom.GetPDBResidueInfo         rdAtom.GetTotalValence           rdAtom.NeedsUpdatePropertyCache  rdAtom.SetIntProp                rdAtom.SetNumRadicalElectrons
rdAtom.GetBoolProp               rdAtom.GetIdx                    rdAtom.GetNeighbors              rdAtom.GetProp                   rdAtom.HasProp                   rdAtom.SetAtomicNum              rdAtom.SetIsAromatic             rdAtom.SetProp
rdAtom.GetChiralTag              rdAtom.GetImplicitValence        rdAtom.GetNoImplicit             rdAtom.GetPropNames              rdAtom.HasQuery                  rdAtom.SetBoolProp               rdAtom.SetIsotope                rdAtom.UpdatePropertyCache
rdAtom.GetDegree                 rdAtom.GetIntProp                rdAtom.GetNumExplicitHs          rdAtom.GetSmarts                 rdAtom.InvertChirality           rdAtom.SetChiralTag              rdAtom.SetMass
nickvandewiele commented 9 years ago

373 provides a rather patchy fix to the conversion of lone pairs from an RMG molecule into a RDKit one for the special case of singlet carbene atoms.