dan2097 / opsin

Open Parser for Systematic IUPAC Nomenclature. Chemical name to structure conversion
https://opsin.ch.cam.ac.uk
MIT License
153 stars 32 forks source link

wrong enantiomer when converting to CML #212

Open karpanGit opened 1 year ago

karpanGit commented 1 year ago

Hello,

we have been trying to convert the name

(3R)-1,1,3-trimethyl-2,3-dihydro-1H-inden-4-amine

to a chemical structure. We do not get consistent behaviour when the conversion is to (extended) smiles and CML. In particular the conversion to smiles generates the correct enantiomer. The conversion to CML generates the wrong enantiomer, and it addition it adds explicit hydrogens that is not desired. The generation of the wrong enantiomer may be related to the fact that the chiral centre is in a ring. We observed this with other ring structures.

Code to reproduce the issue

NameToStructure nts = NameToStructure.getInstance(); NameToStructureConfig n2sConfig = new NameToStructureConfig();

OpsinResult result = nts.parseChemicalName(c_name, n2sConfig); List opsinWarnings = result.getWarnings();
List warningMessages = opsinWarnings.stream().map(OpsinWarning::getMessage).collect(Collectors.toList()); List warningTypes = opsinWarnings.stream().map(OpsinWarning::getType).map(OpsinWarningType::getExplanation).collect(Collectors.toList());

out_smiles = result.getSmiles(SmilesOptions.CXSMILES); //correct out_cml = result.getCml(); //wrong out_warnings = warningMessages.toArray(new String[0]); //empty out_warningTypes = warningTypes.toArray(new String[0]); //empty

dan2097 commented 1 year ago

Sorry for the delay in following up on this. Which tool are you using to read the CML? OpenBabel and CDK give the R enantiomer. I did observe that MarvinJS gave the S enantiomer.

The explicit hydrogens are added due to the difficulty of expressing stereochemistry in CML without them, in the absence of 2D coordinates. The 0D stereodescriptors reference the 4 atoms at a tetrahedral stereocentre, one of which may be a hydrogen. Additionally historically some software misinterpretted CML's hydrogenCount attribute as indicating the implicit hydrogen count, while it's supposed to be the total hydrogen count.