Closed hannbus closed 7 months ago
Hi @hannbus ,
Upon examining the error, it was clear that RDKit was unable to generate the conformer for a given molecule. We had a similar issue in our https://github.com/Steinbeck-Lab/cheminformatics-microservice/tree/main, where I wrote the following code. Perhaps this will be useful?
def get_3d_conformers(molecule: any, depict=True) -> Chem.Mol:
"""
Convert a SMILES string to an RDKit Mol object with 3D coordinates.
Args:
molecule (Chem.Mol): RDKit molecule object.
depict (bool, optional): If True, returns the molecule's 3D structure in MolBlock format.
If False, returns the 3D molecule without hydrogen atoms.
Returns:
str or rdkit.Chem.rdchem.Mol: If `depict` is True, returns the 3D structure in MolBlock format.
Otherwise, returns an RDKit Mol object.
"""
if molecule:
molecule = Chem.AddHs(molecule)
AllChem.EmbedMolecule(molecule, maxAttempts=5000, useRandomCoords=True)
try:
AllChem.MMFFOptimizeMolecule(molecule)
except Exception:
AllChem.EmbedMolecule(molecule, maxAttempts=5000, useRandomCoords=True)
if depict:
return Chem.MolToMolBlock(molecule)
else:
molecule = Chem.RemoveHs(molecule)
return Chem.MolToMolBlock(molecule)
Responding to Kohulan: I think generating optimized 3D coordinates is a bit much here. Isn't there a "layouter" that can generate 2D coordinates on the fly?
Responding to Kohulan: I think generating optimized 3D coordinates is a bit much here. Isn't there a "layouter" that can generate 2D coordinates on the fly?
We can use this if conformers are not needed:
def get_2d_mol(molecule: any) -> str:
"""
Generate a 2D Mol block representation from a given SMILES string.
Args:
molecule (Chem.Mol): RDKit molecule object.
Returns:
str: 2D Mol block representation.
If an error occurs during SMILES parsing, an error message is returned.
"""
if molecule:
AllChem.Compute2DCoords(molecule)
molfile = Chem.MolToMolBlock(molecule)
return molfile
If the second option fixes the issue, I would prefer it, thanks.
A solution with using the molfile as described, did not work for me (or I did not understand it the right way). But it would be enough to generate 2D coordinates for small molecule using Compure2DCoords BEFORE getting the Conformer atom position (as in the code section below). Is this the way to go, or do I need to do it differently?
for mol in to_draw:
AllChem.Compute2DCoords(mol)
atom0_pos = [
mol.GetConformer().GetAtomPosition(0).x,
mol.GetConformer().GetAtomPosition(0).y,
mol.GetConformer().GetAtomPosition(0).z,
]
atom1_pos = [
mol.GetConformer().GetAtomPosition(1).x,
mol.GetConformer().GetAtomPosition(1).y,
mol.GetConformer().GetAtomPosition(1).z,
]
if atom0_pos == atom1_pos:
AllChem.Compute2DCoords(mol)
I think the method AllChem.Compute2DCoords(mol) is exactly what we need for depicting molecules that have no coordinates upon import. But why do you compute the coords, test whether they are the same for the first two atoms, and re-compute them if that is the case? I would first test whether coords are given and if not, generate them.
@hannbus are you going to continue working on this or should we take over? Do you have any non-pushed local changes?
When importing the dataset not from .sdf but from .smi or .txt files, draw_molecules is failing due to a Value
This should not be too hard to fix, I think. Should I do that in the next days sometime?