ReactionMechanismGenerator / RMG-Py

Python version of the amazing Reaction Mechanism Generator (RMG).
http://reactionmechanismgenerator.github.io/RMG-Py/
Other
394 stars 227 forks source link

Inconsistent species between InChI, SMILES and adjlist for Si species #301

Closed bslakman closed 1 year ago

bslakman commented 10 years ago

The issue that I am having (mentioned in #286 ) may be related to this. I am having a problem where RMG is trying to decrement lone pairs on a species which doesn't have any. In trying to debug, I found that the results of toAdjacencyList(), toInChI() and toSMILES() are different for the species in question.

The SMILES string is [SiH3][SiH5] The InChI=1S/H6Si2/c1-2/h1-2H3 which is apparently disilane, Si2H6

And the adjacency list...

1 1 Si u0 p0 c0 {2,S} {3,S} {4,S} {5,S} {6,S} {7,S} 2 3 Si u0 p0 c0 {1,S} {8,S} {9,S} {10,S} 3 H u0 p0 c0 {1,S} 4 H u0 p0 c0 {1,S} 5 H u0 p0 c0 {1,S} 6 H u0 p0 c0 {1,S} 7 *2 H u0 p0 c0 {1,S} 8 H u0 p0 c0 {2,S} 9 H u0 p0 c0 {2,S} 10 H u0 p0 c0 {2,S}

I'm not sure where in the code Si is allowed to be hexavalent, I'm pretty sure that everywhere it is treated as carbon, tetravalent only, unless I'm missing something.

nyee commented 10 years ago

Any idea where this species is coming from? Is it created from a rate rule or a reaction library?

bslakman commented 10 years ago

It is created from a reaction family. If you checkout my new-style-silicon-hydrides branches on RMG-Py and database you can reproduce the error ( @rwest @connie in the future is the best way to state this to open a pull request?)

rwest commented 10 years ago

Trouble with creating a pull request is that it suggests you want us to merge those changes, which I believe is not yet the case. Probably just easiest to mention in the issue how someone can reproduce it. There is no harm in opening a general pull request for this topic branch though - but put in the title or top comment "Do not merge yet!". Then you can keep rebasing it to your heart's content until it is ready, but collect discussion and comments on it in one place. (If there are lots of separate issues though, you may not want all the discussion in one place!)

bslakman commented 10 years ago

I think the way RDKit is handling Si may be a problem/ not compatible with how we represent Si in RMG, and maybe that is why we are applying reaction recipes inappropriately in some places, but recognizing that we can't apply it in others.

For Si, I tried to reverse make an adjacency list for the [SiH3][SiH5] species that is having a problem, called 'struct':

ipdb> struct
Molecule(SMILES="[SiH3][SiH5]", multiplicity=-187)
ipdb> mol1 = Molecule(SMILES="[SiH3][SiH5]")
ipdb> mol1.toAdjacencyList()
'1  Si u0 p0 c0 {2,S} {3,S} {4,S} {5,S}\n2  Si u0 p-2 c0 {1,S} {6,S} {7,S} {8,S} {9,S} {10,S}\n3  H  u0 p0 c0 {1,S}\n4  H  u0 p0 c0 {1,S}\n5  H  u0 p0 c0 {1,S}\n6  H  u0 p0 c0 {2,S}\n7  H  u0 p0 c0 {2,S}\n8  H  u0 p0 c0 {2,S}\n9  H  u0 p0 c0 {2,S}\n10 H  u0 p0 c0 {2,S}\n'
ipdb> mol1.isIsomorphic(struct)
False

It allows the species to be created, but it is not isomorphic with 'struct'. Also the species has multiplicity -187 and when we make the adjacency list, atom 2 has a negative number of lone pairs!

In contrast, for carbon, we raise an error when we try to make a species from the analogous adjacency list:

ipdb> mol2 = Molecule(SMILES="[CH3][CH5]")
[15:30:20] Explicit valence for atom # 1 C, 6, is greater than permitted
*** ValueError: Could not interpret the SMILES string '[CH3][CH5]'

I'm not sure because I've never worked with RDKit, but maybe some fix is needed, as we do with helium in the fromSMILES() method. I do not want to allow hexavalent Si right now. I think it's simpler to use tetravalent Si only, instead of making different atom types that will not be used.

rwest commented 10 years ago

RDKit allowing 6-valent Si shouldn't cause problems, if RMG wasn't making them. When debugging erroneous "molecules" that don't make sense, avoid SMILES strings and instead use the full Adjacency List. The SMILES generated from something that's chemically wrong will just mislead you.

Probably we still don't have enough error checks in here, for things like number of lone pairs is not negative, etc. Where these checks should exist is another question: whether when creating a molecule by reading from an adjacency list, or when applying a reaction template, or whether there's a "check self is valid" method that we call after both of these events.

github-actions[bot] commented 1 year ago

This issue is being automatically marked as stale because it has not received any interaction in the last 90 days. Please leave a comment if this is still a relevant issue, otherwise it will automatically be closed in 30 days.