chembl / ChEMBL_Structure_Pipeline

ChEMBL database structure pipelines
MIT License
193 stars 38 forks source link

Standardizer writes atomic properties to mol #52

Open gayverjr opened 1 year ago

gayverjr commented 1 year ago

Is standardize_mol writing atomic properties to the mol object expected behavior? Here is a situation I came across:

import rdkit
from chembl_structure_pipeline import standardizer
smi = "CC1COCC[S@]1=O |&1:6|"
mol = rdkit.Chem.MolFromSmiles(smi)
mol = standardizer.standardize_mol(mol)
print(rdkit.Chem.MolToCXSmiles(mol))

> CC1COCC[S@@+]1[O-] |atomProp:0.react_atom_idx.0:1.old_mapno.4:1.react_atom_idx.1:1.was_dummy.1:2.react_atom_idx.2:3.react_atom_idx.3:4.react_atom_idx.4:5.old_mapno.1:5.react_atom_idx.5:5.was_dummy.1:6.old_mapno.2:6.react_atom_idx.6:7.old_mapno.3:7.react_atom_idx.7,&1:6|

This seems to be valid CXSmiles, and one can always clear out the properties, but I was curious if this behavior is intended.

greglandrum commented 1 year ago

Hi @gayverjr. I believe that this is something that the RDKit itself should be cleaning up. Would you mind creating an issue in the RDKit repository for this?