ReactionMechanismGenerator / RMG-Py

Python version of the amazing Reaction Mechanism Generator (RMG).
http://reactionmechanismgenerator.github.io/RMG-Py/
Other
374 stars 227 forks source link

Can't construct surface rmgpy.molecule.Molecule() #2603

Open sevyharris opened 4 months ago

sevyharris commented 4 months ago

Bug Description

I'm trying to create a rmgpy.molecule.Molecule() using smiles

rmgpy.molecule.Molecule(smiles='O=C=[X]')

But it fails from an RDKit error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_18284/1886713532.py in <module>
----> 1 rmgpy.molecule.Molecule(smiles='O=C=[X]')
~/rmg/RMG-Py/rmgpy/molecule/molecule.cpython-37m-x86_64-linux-gnu.so in rmgpy.molecule.molecule.Molecule.__init__()
~/rmg/RMG-Py/rmgpy/molecule/molecule.cpython-37m-x86_64-linux-gnu.so in rmgpy.molecule.molecule.Molecule.from_smiles()
~/rmg/RMG-Py/rmgpy/molecule/translator.cpython-37m-x86_64-linux-gnu.so in rmgpy.molecule.translator.from_smiles()
~/rmg/RMG-Py/rmgpy/molecule/translator.cpython-37m-x86_64-linux-gnu.so in rmgpy.molecule.translator.from_smiles()
~/rmg/RMG-Py/rmgpy/molecule/translator.cpython-37m-x86_64-linux-gnu.so in rmgpy.molecule.translator._read()
~/rmg/RMG-Py/rmgpy/molecule/translator.cpython-37m-x86_64-linux-gnu.so in rmgpy.molecule.translator._rdkit_translator()
ValueError: Could not interpret the identifier 'O=C=[X]'

Possible Solution

I'm wondering if we can get around RDKit's limitations with the following procedure:

It would look something like adding this to the _rdkit_translator() function in RMG-Py/rmgpy/molecule/translator.py:

from rdkit import Chem
import rmgpy.molecule

COX_smiles = 'O=C=[X]'

rdkitmol = Chem.MolFromSmiles(COX_smiles.replace('X', 'Ar'))
mol = rmgpy.molecule.molecule.Molecule()
output = rmgpy.molecule.converter.from_rdkit_mol(mol, rdkitmol)

lines = output.to_adjacency_list().split('\n')

for i, line in enumerate(lines):
    if 'Ar' in line:
        lines[i] = lines[i].replace('Ar', 'X')
        # remove any extra electron pairs...
        lines[i] = lines[i].replace('p3', 'p0')
        lines[i] = lines[i].replace('p2', 'p0')
        lines[i] = lines[i].replace('p1', 'p0')
adj_list = '\n'.join(lines)
COX = rmgpy.molecule.molecule.Molecule().from_adjacency_list(adj_list)

print(COX.smiles)
print(COX.to_adjacency_list())

Other Possible Solution

One workaround is to just instantiate the molecule using the adjacency list. But sometimes that's annoying- especially since we can use smiles for gas phase molecules without any issue. At the very least, a more descriptive error message is probably required, telling the user that you have to use the adjacency list.

mjohnson541 commented 4 months ago

If the atom swapping doesn't work well. I have an julia code sitting around from several years ago that I wrote for going from smiles to a julia equivalent of the molecule class. If someone were interested in transcoding that to cython I think that would make it trivial to add this feature and any other modifications we like to smiles=>molecule in the future.

sevyharris commented 4 months ago

If the atom swapping doesn't work well. I have an julia code sitting around from several years ago that I wrote for going from smiles to a julia equivalent of the molecule class. If someone were interested in transcoding that to cython I think that would make it trivial to add this feature and any other modifications we like to smiles=>molecule in the future.

I might be interested. It looks like there's some weird recursions going on that are complicating my attempts to implement this at the _rdkit_translator level

github-actions[bot] commented 1 month ago

This issue is being automatically marked as stale because it has not received any interaction in the last 90 days. Please leave a comment if this is still a relevant issue, otherwise it will automatically be closed in 30 days.