SMILES in RMG - Githubissues

Note: This is not an issue about a bug in the code. I have a list of questions related to the usage of SMILES in RMG. Somewhat related to issue #156.

In the annotated output files, most species labels are SMILES, but some aren't, i.e. H2, N2. Could you explain why/when SMILES format is not used?
Isomeric SMILES can distinguish cis, trans species. Is RMG (rate rules, thermo) also capable of distinguishing such isomers?
Am I right to think that, SMILES should always be converted to RMG specific adjacency format for rate rules and group additivity methods to work but not necessarily for QM calculations?
Maybe the core of the issue is the conversion of SMILES to adjacency list and vice versa. In these conversions, I guess no information is lost, so one should be able to get the same answer when the conversion is reversed. Is this correct? Are there exceptions?

My attempts to answer:

1) In the annotated output files, most species labels are SMILES, but some aren't, i.e. H2, N2. Could you explain why/when SMILES format is not used?

For the output species labels, see chemkin.getSpeciesIdentifier(species) in chemkin.py If the species is in your input file (and perhaps seed mechanism?), then the given name is used, otherwise it'll try to use the SMILES as the species label. (Hence methane is "C"). But if the SMILES contains a character that can't go in a chemkin name, such as [] or = (so anything with radicals or double bonds), then it instead uses the chemical formula. Hence "H2", "CH3".

2) Isomeric SMILES can distinguish cis, trans species. Is RMG (rate rules, thermo) also capable of distinguishing such isomers?

No. RMG represents species as simple graphs with no direction about a double bond. As such, when we generate a SMILES in RMG we remove all the stereochemistry indicators.

3) Am I right to think that, SMILES should always be converted to RMG specific adjacency format for rate rules and group additivity methods to work but not necessarily for QM calculations?

I don't fully understand the question, but I think the answer to the first half is "yes". RMG works internally with RMG Molecules, which map directly and unambiguously onto the adjacency list syntax. SMILES is just a convenience for input/output, and as soon as you enter a SMILES it'll be turned into a Molecule object (adjacency list, if you prefer). The QM calculations within RMG also work from Molecule objects - although if you wanted to hack your own script using parts of RMG you may be able to avoid this and go straight from SMILES to RDMol but then you'd basically be using RDKit Molecules not RMG Molecules, and I'm not sure of the benefit.

4) Maybe the core of the issue is the conversion of SMILES to adjacency list and vice versa.

Probably. RDKit cannot cope with the way RMG does variable-valence nitrogen, so cannot be used to parse/generate SMILES for N-containing species. OpenBabel does not have the same canonical SMILES generator as RDKit.

In these conversions, I guess no information is lost, so one should be able to get the same answer when the conversion is reversed.

Ideally. SMILES cannot represent some electronic states. Adjacency List cannot represent stereochemistry.

Is this correct? Are there exceptions?

There are always bugs. Eg. until very recently if you parsed the SMILES for elemental carbon "[C]" you got a quintuplet tetra-radical which then generated a SMILES for methane "C". I am not aware of any open ones on the latest master branch, now that https://github.com/ReactionMechanismGenerator/RMG-Py/pull/949 has been merged, but (1) that merge is since the last binary release and (2) there are always more bugs we haven't found.

ReactionMechanismGenerator / RMG-Py

SMILES in RMG #972