connorcoley / rdchiral

Wrapper for RDKit's RunReactants to improve stereochemistry handling
MIT License
151 stars 50 forks source link

Chiral centers more than one bond away from reaction centers #20

Open ljn917 opened 4 years ago

ljn917 commented 4 years ago

This testing was done with commit 246e171 (current master head), python 3.8 and rdkit 2020.03.4 (conda-forge).

Below is a reaction from USPTO, PatentNumber US03956392. The current code may produce inconsistent and incorrect template. ReactionSmiles is [C:1]([CH:4]1[CH:9]([CH3:10])[CH:8]=[CH:7][CH2:6][C:5]1([CH3:12])[CH3:11])(=[O:3])[CH3:2]>C(O)C.[Pd]>[C:1]([C@H:4]1[C@@H:9]([CH3:10])[CH2:8][CH2:7][CH2:6][C:5]1([CH3:11])[CH3:12])(=[O:3])[CH3:2] uspto

The current code gives forward template [C:1]-[CH;D2;+0:2]=[CH;D2;+0:3]-[CH;D3;+0:4](-[C;D1;H3:5])-[C:6]-[C:7](-[C;D1;H3:8])=[O;D1;H0:9]>>[C:1]-[CH2;D2;+0:2]-[CH2;D2;+0:3]-[C@H;D3;+0:4](-[C;D1;H3:5])-[C:6]-[C:7](-[C;D1;H3:8])=[O;D1;H0:9] smarts_current

I believe both atom 4 and 9 (in the original reaction) should be included, so the expected forward template should be: [C;D1;H3:1]-[C;H1;D3;+0:2]1-[C;H1;D2;+0:3]=[C;H1;D2;+0:4]-[C:5]-[C:6]-[C;H1;D3;+0:7]-1-[C:8](-[C;D1;H3:9])=[O;D1;H0:10]>>[C;D1;H3:1]-[C@;H1;D3;+0:2]1-[C;H2;D2;+0:3]-[C;H2;D2;+0:4]-[C:5]-[C:6]-[C@;H1;D3;+0:7]-1-[C:8](-[C;D1;H3:9])=[O;D1;H0:10] smarts_expected

The reason for the current behavior is the following. The strategy looking for chiral centers adjacent to reaction centers depends on the order of atoms when the distance is greater than one bond. In this case, for example, atoms with mapping number 7 and 8 are reaction centers in the original reaction, and atoms with mapping number 4 and 9 are related chiral centers. If tetra_atoms has [(4, ...), (9, ..)], atom 4 will be discarded because atom 9 is not seen yet (this is the actual situation); otherwise if tetra_atoms contains [(9, ...), (4, ..)], atom 4 will be included because atom 9 is included before. This makes the output SMARTS depends on the order of atoms in RDKit data structure and cause inconsistent behavior.

The fix will be to use BFS/DFS to search the neighbors of reaction centers and the neighbors of all related chiral centers. One quick and dirty workaround is simply adding for i in range(len(tetra_atoms)): before line 174. It effectively changes the search to BFS though with worse time complexity.

thomasstruble commented 4 years ago

This is an interesting case. This is not a reaction that is setting any stereocenters but the recorded product includes it because it is showing the relative stereochemistry since the syn isomer was isolated at 90% and the trans in 10%. But this ratio is inherited by the starting material since they followed the prep (K. S. Ayyar Chem. Comm. 1973, 161). Will have to look closer at the extraction to see and try to find cases where distal stereocenters are set from the reactive center.