Actelion / openchemlib

Open source Java-based chemistry library
Other
83 stars 29 forks source link

SmilesParser is not idempotent #92

Closed targos closed 5 months ago

targos commented 1 year ago

Once it's been called with readStereoFeatures = true, subsequent calls with readStereoFeatures = false don't work as expected:

SmilesParser parser = new SmilesParser();
StereoMolecule molecule = new StereoMolecule(0, 0);
String smiles = "Cl/C=C/Br";

parser.parse(molecule, smiles.getBytes(), true, false);
System.out.println(molecule.getIDCode()); // gC`DAbZHRVX@

parser.parse(molecule, smiles.getBytes(), true, true);
System.out.println(molecule.getIDCode()); // gC`DAbZHRVXRP

parser.parse(molecule, smiles.getBytes(), true, false);
System.out.println(molecule.getIDCode()); // gC`DAbZHRVXRP

Moreover, if we call it with createCoordinates = false, it will always print gC`DAbZHRVXRP, regardless of the value of readStereoFeatures.

targos commented 9 months ago

@thsa What do you think?

thsa commented 5 months ago

@targos Hi Michael, sorry for the late response. I didn't notice the question. I have just fixed it. In this situation the StereoMolecule was re-using the Canonizer from previous runs and taking parity information from there.

Concerning createCoordinates = false: You cannot use molecule.getIDCode(), if the Molecule has valid parities and does not contain atom coordinates. Thus, after Smiles parsing with createCoordinates = false we have this sutuation and it will always print 'null' now. molecule.getIDCode() is a convenient method, which only works, if a Canonizer was created and used during the ensureHelperArrays() call.

targos commented 5 months ago

Thank you for the fix.