SEDenmarkLab / molli

Molecular Library Toolbox
https://molli.readthedocs.io
MIT License
46 stars 5 forks source link

CDXML Parsing/Implicit Hydrogen Error in Some Certain Ring Systems #159

Open blakeo2 opened 1 week ago

blakeo2 commented 1 week ago

A new issue within the CDXML parser was found with multi-ring/spirocyclic systems. Due to multiple parts of the ring getting bent at the same time upon parsing, it can sometimes place the ring such that the atoms making up the stereocenter lead to implicit hydrogens being added with the incorrect stereochemistry.

This can currently be solved by explicitly indicating the hydrogens on the stereocenters within the CDXML file, but a general solution to this problem could help make it less esoteric where/when explicit hydrogens should be added to stereocenters. A warning might also be helpful.

CDXML_Problem.zip

KarnParmar commented 6 days ago

code_w_cdxml.zip

Here is an updated cdxml file and the code I used to check the structures. Here are my results: image

esalx commented 4 days ago

@blakeo2 just to clarify, that last CDXML that you uploaded with the test code, is that supposed to be a set of test cases for this problem?

blakeo2 commented 4 days ago

@esalx the one I uploaded with the test code was illustrating the way I had to draw it to fix the structures, and the other structure was the one that had incorrect stereocenters for reproducing the error.