Steinbeck-Lab / ChemIcal-DatasEt-comparatoR

ChemIcal DatasEt comparatoR (CIDER) is a Python package and ready-to-use Jupyter Notebook workflow which primarily utilizes RDKit to compare two or more chemical structure datasets (SD files) in different aspects, e.g. size, overlap, molecular descriptor distributions, chemical space clustering, etc., most of which can be visually inspected.
Other
7 stars 4 forks source link

draw_molecules with smi input does not work #75

Closed hannbus closed 7 months ago

hannbus commented 10 months ago

When importing the dataset not from .sdf but from .smi or .txt files, draw_molecules is failing due to a Value

ValueError                                Traceback (most recent call last)
[c:\Users\hannah\Desktop\Data\Git\Not](file:///C:/Users/hannah/Desktop/Data/Git/Not) for upload\CIDER_Workflow_Copy.ipynb Cell 19 in <cell line: 1>()
----> [1](vscode-notebook-cell:/c%3A/Users/hannah/Desktop/Data/Git/Not%20for%20upload/CIDER_Workflow_Copy.ipynb#X23sZmlsZQ%3D%3D?line=0) cider.draw_molecules(testdict, number_of_mols = 20, mols_per_row = 5)

File [c:\Users\hannah\.conda\envs\CDC_neu\lib\site-packages\CIDER\cider.py:473](file:///C:/Users/hannah/.conda/envs/CDC_neu/lib/site-packages/CIDER/cider.py:473), in ChemicalDatasetComparator.draw_molecules(self, all_dicts, number_of_mols, mols_per_row, image_size, data_type, figsize, fontsize_title, fontsize_subtitle)
    470     to_draw.append(all_dicts[single_dict][self.import_keyname][i])
    471 for mol in to_draw:
    472     atom0_pos = [
--> 473         mol.GetConformer().GetAtomPosition(0).x,
    474         mol.GetConformer().GetAtomPosition(0).y,
    475         mol.GetConformer().GetAtomPosition(0).z,
    476     ]
    477     atom1_pos = [
    478         mol.GetConformer().GetAtomPosition(1).x,
    479         mol.GetConformer().GetAtomPosition(1).y,
    480         mol.GetConformer().GetAtomPosition(1).z,
    481     ]
    482     if atom0_pos == atom1_pos:

ValueError: Bad Conformer Id

This should not be too hard to fix, I think. Should I do that in the next days sometime?

Kohulan commented 9 months ago

Hi @hannbus ,

Upon examining the error, it was clear that RDKit was unable to generate the conformer for a given molecule. We had a similar issue in our https://github.com/Steinbeck-Lab/cheminformatics-microservice/tree/main, where I wrote the following code. Perhaps this will be useful?


def get_3d_conformers(molecule: any, depict=True) -> Chem.Mol:
    """
    Convert a SMILES string to an RDKit Mol object with 3D coordinates.

    Args:
        molecule (Chem.Mol): RDKit molecule object.
        depict (bool, optional): If True, returns the molecule's 3D structure in MolBlock format.
            If False, returns the 3D molecule without hydrogen atoms.

    Returns:
        str or rdkit.Chem.rdchem.Mol: If `depict` is True, returns the 3D structure in MolBlock format.
            Otherwise, returns an RDKit Mol object.

    """
    if molecule:
        molecule = Chem.AddHs(molecule)
        AllChem.EmbedMolecule(molecule, maxAttempts=5000, useRandomCoords=True)
        try:
            AllChem.MMFFOptimizeMolecule(molecule)
        except Exception:
            AllChem.EmbedMolecule(molecule, maxAttempts=5000, useRandomCoords=True)
        if depict:
            return Chem.MolToMolBlock(molecule)
        else:
            molecule = Chem.RemoveHs(molecule)
            return Chem.MolToMolBlock(molecule)
JonasSchaub commented 9 months ago

Responding to Kohulan: I think generating optimized 3D coordinates is a bit much here. Isn't there a "layouter" that can generate 2D coordinates on the fly?

Kohulan commented 9 months ago

Responding to Kohulan: I think generating optimized 3D coordinates is a bit much here. Isn't there a "layouter" that can generate 2D coordinates on the fly?

We can use this if conformers are not needed:

def get_2d_mol(molecule: any) -> str:
    """
    Generate a 2D Mol block representation from a given SMILES string.

    Args:
        molecule (Chem.Mol): RDKit molecule object.

    Returns:
        str: 2D Mol block representation.
        If an error occurs during SMILES parsing, an error message is returned.
    """

    if molecule:
        AllChem.Compute2DCoords(molecule)
        molfile = Chem.MolToMolBlock(molecule)
        return molfile
JonasSchaub commented 9 months ago

If the second option fixes the issue, I would prefer it, thanks.

hannbus commented 9 months ago

A solution with using the molfile as described, did not work for me (or I did not understand it the right way). But it would be enough to generate 2D coordinates for small molecule using Compure2DCoords BEFORE getting the Conformer atom position (as in the code section below). Is this the way to go, or do I need to do it differently?

for mol in to_draw:
                AllChem.Compute2DCoords(mol)
                atom0_pos = [
                    mol.GetConformer().GetAtomPosition(0).x,
                    mol.GetConformer().GetAtomPosition(0).y,
                    mol.GetConformer().GetAtomPosition(0).z,
                ]
                atom1_pos = [
                    mol.GetConformer().GetAtomPosition(1).x,
                    mol.GetConformer().GetAtomPosition(1).y,
                    mol.GetConformer().GetAtomPosition(1).z,
                ]
                if atom0_pos == atom1_pos:
                    AllChem.Compute2DCoords(mol)
JonasSchaub commented 9 months ago

I think the method AllChem.Compute2DCoords(mol) is exactly what we need for depicting molecules that have no coordinates upon import. But why do you compute the coords, test whether they are the same for the first two atoms, and re-compute them if that is the case? I would first test whether coords are given and if not, generate them.

JonasSchaub commented 9 months ago

@hannbus are you going to continue working on this or should we take over? Do you have any non-pushed local changes?