ReactionMechanismGenerator / RMG-Py

Python version of the amazing Reaction Mechanism Generator (RMG).
http://reactionmechanismgenerator.github.io/RMG-Py/
Other
396 stars 228 forks source link

Singlet HCOH creates SMILES for COH #1695

Closed rwest closed 1 year ago

rwest commented 5 years ago

Bug Description

Creating the molecule for singlet HCOH (or loading it from one of the thermo or kinetics libraries) then creating the SMILES string reports [C-]=[OH+] which is missing an H atom; it should be [CH-]=[OH+]

How To Reproduce

In [1]: from rmgpy.species import Species

In [2]: Species().fromAdjacencyList("""
   ...: 1 O u0 p1 c+1 {2,D} {4,S}
   ...: 2 C u0 p1 c-1 {1,D} {3,S}
   ...: 3 H u0 p0 c0 {2,S}
   ...: 4 H u0 p0 c0 {1,S}
   ...: """)
Out[2]: Species(label="", molecule=[Molecule(SMILES="[C-]=[OH+]")], molecularWeight=(30.026,'amu'))

In [3]: from rmgpy.molecule import Molecule

In [4]: print Molecule(SMILES="[C-]=[OH+]").toAdjacencyList()
multiplicity 2
1 C u1 p1 c-1 {2,D}
2 O u0 p1 c+1 {1,D} {3,S}
3 H u0 p0 c0 {2,S}

The SMILES [CH]O creates the triplet version which can do a round-trip to and from SMILES just fine:

In [10]: print Molecule(SMILES="[CH]O").toAdjacencyList()
multiplicity 3
1 C u2 p0 c0 {2,S} {3,S}
2 H u0 p0 c0 {1,S}
3 O u0 p2 c0 {1,S} {4,S}
4 H u0 p0 c0 {3,S}

In [11]: Species().fromAdjacencyList("""
    ...: multiplicity 3
    ...: 1 C u2 p0 c0 {2,S} {3,S}
    ...: 2 H u0 p0 c0 {1,S}
    ...: 3 O u0 p2 c0 {1,S} {4,S}
    ...: 4 H u0 p0 c0 {3,S}
    ...: """)
Out[11]: Species(label="multiplicity 3", molecule=[Molecule(SMILES="[CH]O")], molecularWeight=(30.026,'amu'))

According to the DFT_QCI_thermo thermo library, the singlet is the most stable with ∆Hf=26 kcal/mol and the triplet is ∆Hf=52 kcal/mol.

rwest commented 3 years ago
In [4]: s =Species().from_adjacency_list("""1 O u0 p1 c+1 {2,D} {4,S}
   ...:    ...: 2 C u0 p1 c-1 {1,D} {3,S}
   ...:    ...: 3 H u0 p0 c0 {2,S}
   ...:    ...: 4 H u0 p0 c0 {1,S}
   ...:    ...: """)

In [8]: m = s.molecule[0]

In [10]: print(m.to_adjacency_list())
1 O u0 p1 c+1 {2,D} {4,S}
2 C u0 p1 c-1 {1,D} {3,S}
3 H u0 p0 c0 {2,S}
4 H u0 p0 c0 {1,S}

Using OpenBabel3 (https://github.com/ReactionMechanismGenerator/RMG-Py/pull/2088 ) the openbabel back end works ok:

In [33]: rmgpy.molecule.translator.to_smiles(m, backend='openbabel')
/Users/rwest/opt/anaconda3/envs/rmgob3/lib/python3.7/site-packages/openbabel/__init__.py:14: UserWarning: "import openbabel" is deprecated, instead use "from openbabel import openbabel"
  warnings.warn('"import openbabel" is deprecated, instead use "from openbabel import openbabel"')
Out[33]: '[CH-]=[OH+]'

but the RDKit backend (with RDKit version 2020.03.3.0 from the rmg conda channel) gets it wrong

In [32]: rmgpy.molecule.translator.to_smiles(m, backend='rdkit')
Out[32]: '[C-]=[OH+]'

If you try

In [39]: rmgpy.molecule.converter.to_rdkit_mol(m)

you get

AtomValenceException: Explicit valence for atom # 1 C greater than permitted

but in the translator it does it with sanitize=False option, like this:

In [40]: d = rmgpy.molecule.converter.to_rdkit_mol(m, sanitize=False)

In [41]: d.Debug()
Atoms:
    0 8 O chg: 1  deg: 1 exp: 3 imp: 0 hyb: 0 arom?: 0 chi: 0
    1 6 C chg: -1  deg: 1 exp: 3 imp: 0 hyb: 0 arom?: 0 chi: 0 rad: 2
Bonds:
    0 0->1 order: 2 conj?: 0 aromatic?: 0

here it seems the C has explicit valence of 3 and 2 radicals?

If you call it with the addition of remove_h=False then

In [43]: d = rmgpy.molecule.converter.to_rdkit_mol(m, sanitize=False, remove_h=False)
    ...: 

In [44]: d.Debug()
Atoms:
    0 8 O chg: 1  deg: 2 exp: N/A imp: -1 hyb: 0 arom?: 0 chi: 0
    1 6 C chg: -1  deg: 2 exp: N/A imp: -1 hyb: 0 arom?: 0 chi: 0 rad: 2
    2 1 H chg: 0  deg: 1 exp: N/A imp: -1 hyb: 0 arom?: 0 chi: 0
    3 1 H chg: 0  deg: 1 exp: N/A imp: -1 hyb: 0 arom?: 0 chi: 0
Bonds:
    0 0->1 order: 2 conj?: 0 aromatic?: 0
    1 0->3 order: 1 conj?: 0 aromatic?: 0
    2 1->2 order: 1 conj?: 0 aromatic?: 0

with these errors interspersed every time it tries to calculate explicit valence (listed as N/A above)

[13:06:10] 
****
Pre-condition Violation
getExplicitValence() called without call to calcExplicitValence()
Violation occurred on line 196 in file /Users/glandrum/anaconda3/conda-bld/rdkit_1591931638269/work/Code/GraphMol/Atom.cpp
Failed Expression: d_explicitValence > -1
****
[13:06:10] 
****
Pre-condition Violation
getExplicitValence() called without call to calcExplicitValence()
Violation occurred on line 196 in file /Users/glandrum/anaconda3/conda-bld/rdkit_1591931638269/work/Code/GraphMol/Atom.cpp
Failed Expression: d_explicitValence > -1
****
[13:06:10] 
****
Pre-condition Violation
getExplicitValence() called without call to calcExplicitValence()
Violation occurred on line 196 in file /Users/glandrum/anaconda3/conda-bld/rdkit_1591931638269/work/Code/GraphMol/Atom.cpp
Failed Expression: d_explicitValence > -1
****
[13:06:10] 
****
Pre-condition Violation
getExplicitValence() called without call to calcExplicitValence()
Violation occurred on line 196 in file /Users/glandrum/anaconda3/conda-bld/rdkit_1591931638269/work/Code/GraphMol/Atom.cpp
Failed Expression: d_explicitValence > -1
****

The next place I would look is in rmgpy/molecule/converter.py around line 76 in to_rdkit_mol where we have

        if atom.element.symbol == 'C' and atom.lone_pairs == 1 and mol.multiplicity == 1: rd_atom.SetNumRadicalElectrons(
            2)

which was introduced by Nick in https://github.com/ReactionMechanismGenerator/RMG-Py/commit/668ae661e640775ec85b6549b32da02c1741ac93 specifically to deal with singlet carbenes.

I think the problem is that line doesn't consider the charge, and doesn't increment the number of radical electrons, it just sets it.

but it might also be worth upgrading RDKit first.

github-actions[bot] commented 1 year ago

This issue is being automatically marked as stale because it has not received any interaction in the last 90 days. Please leave a comment if this is still a relevant issue, otherwise it will automatically be closed in 30 days.