Open hesther opened 1 year ago
That is a very interesting edge case, and an unfortunate bug. It's not immediately clear to me why this would be the case since the bond direction map is built-up from the atom mapping (here). I don't believe RDKit would be making any assumptions or errors about atom map numbers and the directionality of the bond in terms of ENDDOWNRIGHT and ENDUPRIGHT flags. It seems like something must be wrong in my possible_defs definition that is only triggered when the atom mapping is "unusual".
I'm afraid that we don't have anyone actively working on RDChiral at the moment. I will tag @ljn917 in case this is something he encountered when working on the C++ version. My suggestion would otherwise be to drop in some debugging statements into the function that restores bond stereochemistry in that bonds.py file to start to narrow down where the bug is coming from
I confirm that the C++ version also has this bug. I encountered an rdkit bug where renumbering changes double bond directions, though I don't think they are related.
I tried to narrow it down a bit by simplifying the reactant structure, but find more interesting behaviors. Please see the script below. It was run with the C++ version. In summary, to trigger this bug, a second double bond (between mapno 9 and 10) must exist, and the atom 9 (mapno) must be substituted, no matter the substituents are symmetric or not.
Update: the two double bonds must be next to each other (in conjugation).
#!/usr/bin/env python3
import random
from rdkit import Chem
from rdchiral.main import rdchiralRun, rdchiralReaction, rdchiralReactants
random.seed(0)
def randomize_mapno(s: str):
mol = Chem.MolFromSmiles(s)
x=[x for x in range(1,50)]
random.shuffle(x)
x=[0]+x
for a in mol.GetAtoms():
a.SetAtomMapNum(x[a.GetAtomMapNum()])
return Chem.MolToSmiles(mol)
# smi1 = r'[CH3:9][CH2:10]/[C:11]([CH3:12])=[CH:13]/[CH2:14][OH:15]' # case 5: correct behavior
# smi1 = r'[CH2:9]=[CH:10]/[C:11]([CH3:12])=[CH:13]/[CH2:14][OH:15]' # case 4: correct behavior
smi1 = r'[CH3:8]/[CH:9]=[CH:10]/[C:11]([CH3:12])=[CH:13]/[CH2:14][OH:15]' # case 3: incorrect behavior
#smi1 = r'[CH3:8][CH:9]=[CH:10]/[C:11]([CH3:12])=[CH:13]/[CH2:14][OH:15]' # case 3-1: incorrect behavior
# smi1 = r'[CH2:4]=[CH:5]/[C:6]([CH3:7])=[CH:8]/[CH:9]=[CH:10]/[C:11]([CH3:12])=[CH:13]/[CH2:14][OH:15]' # case 2: incorrect behavior
# smi1 = '[CH3:1][C:2]1=[C:3](/[CH:4]=[CH:5]/[C:6]([CH3:7])=[CH:8]/[CH:9]=[CH:10]/[C:11]([CH3:12])=[CH:13]/[CH2:14][OH:15])[C:16]([CH3:17])([CH3:18])[CH2:19][CH2:20][CH2:21]1' # case 1: incorrect behavior
template = r'[C:1]/[C:2](-[C;D1;H3:3])=[C:4]/[CH2;D2;+0:5]-[OH;D1;+0:6]>>[C:1]/[C:2](-[C;D1;H3:3])=[C:4]/[CH;D2;+0:5]=[O;H0;D1;+0:6]'
outcomes1 = rdchiralRun(rdchiralReaction(template), rdchiralReactants(smi1), combine_enantiomers=False)
print("right outcome:",outcomes1)
#smi2 = r'[CH3:48][CH2:8]\[C:34](=[CH:21]\[CH2:49][OH:6])[CH3:35]'
for cnt in range(100000):
smi2 = randomize_mapno(smi1)
outcomes2 = rdchiralRun(rdchiralReaction(template), rdchiralReactants(smi2), combine_enantiomers=False)
# print(smi2, outcomes2[0])
if outcomes1[0] != outcomes2[0]:
print(f'wrong, smi2={smi2}, outcomes2={outcomes2[0]}')
else:
# print('correct')
pass
That's interesting ... I will run a couple of tests tonight, let's see whether we can narrow down which part of the code it comes from (shouldn't be too hard with a couple of print statements)
Update: Identified the issue. If a template specifies one or multiple double bonds that are in a conjugated system and reverses a bond direction (which happens based on the atom order which in turn depends on the map number) we do not reverse all other bond directions in the conjugated system (which would be the correct thing to do). So the copied over bond directions in unchanged parts of the molecule might not match the bond directions set by the template if the system is conjugated. I am now working on a fix (check whether a conjugated system exists and whether changes were made to any bond, if so, correct all bond directions). Will post a PR in the next days
Very interesting & good catch! While the test cases are not that exhaustive, would you also be able to add this example into the list?
Hi, I just noticed that the outcome of RunReaction depends on the atom ordering of the input molecule for some edge cases.
For example, applying the template
'[C:1]/[C:2](-[C;D1;H3:3])=[C:4]/[CH2;D2;+0:5]-[OH;D1;+0:6]>>[C:1]/[C:2](-[C;D1;H3:3])=[C:4]/[CH;D2;+0:5]=[O;H0;D1;+0:6]'
(a simple reduction of an aldehyde to an alcohol next to a double bond, which does not change the double bond or its stereochemistry) to the molecule'[CH3:1][C:2]1=[C:3](/[CH:4]=[CH:5]/[C:6]([CH3:7])=[CH:8]/[CH:9]=[CH:10]/[C:11]([CH3:12])=[CH:13]/[CH2:14][OH:15])[C:16]([CH3:17])([CH3:18])[CH2:19][CH2:20][CH2:21]1'
correctly produces'CC1=C(/C=C/C(C)=C/C=C/C(C)=C/C=O)C(C)(C)CCC1'
while applying the same template to the molecule
'[CH3:1][C:13](=[CH:28]\\[CH:48]=[CH:8]\\[C:34](=[CH:21]\\[CH2:49][OH:6])[CH3:35])/[CH:37]=[CH:15]/[C:29]1=[C:2]([CH3:44])[CH2:11][CH2:16][CH2:41][C:39]1([CH3:12])[CH3:24]'
(exactly the same as above, only different atom numbering) wrongly produces'CC1=C(/C=C/C(C)=C/C=C/C(C)=C\\C=O)C(C)(C)CCC1'
(i.e. flipped a trans to a cis bond)Code to reproduce:
Is there any way this could be fixed in the near future? (Am relying on rdchiral for a big project)