tl;dr: A needs to be at the end of a list of decorators.
This was a torsion in a subset of MiniDrugBank where chemper was failing to generate a set of SMIRKS that kept the clustering.
I was very confused because in that particular set, the SMIRK for that group of torsions was:
'[#6H1X3ar6,#7AH0X2r5;+0;x2:1](-[#1!rX1x0,#16X2r5x2;+0;A;H0]):,=;@[#6Ar5,#6ar6;+0;H0;X3;x2:2](-,:;@[#6AH0r5,#6H1ar6;+0;X3;x2]-,:[#6!rAH0x0,#6H1ar6x2;+0;X3])-;!@[#8AH1X2x0!r+0:3]-;!@[#1AH0X1x0!r+0:4]'
Since I couldn't figure out by eye what wasn't working, I decided to try visualizing pieces of the SMIRKS to see what didn't work. I tried a lot of iterations that I won't share in detail here, but basically looked like individual pieces of the SMIRKS and different combinations of the decorators.
Finally, I found the issue was with atom 2 with SMIRKS '[#6Ar5,#6ar6;+0;H0;X3;x2:1]' highlighted here:
Now, the part of the SMIRKS which should match that atoms is '[#6Ar5:1]', but it doesn't work, however if you switch the A and r5 decorators to make '[#6r5A:1]' it does work. I think the problem here is that the Ar is interpreted as argon instead of the implicit & between them. I'd rather not start printing every implicit & symbol so I'm going to try forcing the a decorators to come at the end of the SMIRKS.
tl;dr:
A
needs to be at the end of a list of decorators.This was a torsion in a subset of MiniDrugBank where chemper was failing to generate a set of SMIRKS that kept the clustering.
I was very confused because in that particular set, the SMIRK for that group of torsions was:
'[#6H1X3ar6,#7AH0X2r5;+0;x2:1](-[#1!rX1x0,#16X2r5x2;+0;A;H0]):,=;@[#6Ar5,#6ar6;+0;H0;X3;x2:2](-,:;@[#6AH0r5,#6H1ar6;+0;X3;x2]-,:[#6!rAH0x0,#6H1ar6x2;+0;X3])-;!@[#8AH1X2x0!r+0:3]-;!@[#1AH0X1x0!r+0:4]'
Since I couldn't figure out by eye what wasn't working, I decided to try visualizing pieces of the SMIRKS to see what didn't work. I tried a lot of iterations that I won't share in detail here, but basically looked like individual pieces of the SMIRKS and different combinations of the decorators.
Finally, I found the issue was with atom 2 with SMIRKS
'[#6Ar5,#6ar6;+0;H0;X3;x2:1]'
highlighted here:Now, the part of the SMIRKS which should match that atoms is
'[#6Ar5:1]'
, but it doesn't work, however if you switch theA
andr5
decorators to make'[#6r5A:1]'
it does work. I think the problem here is that theAr
is interpreted as argon instead of the implicit&
between them. I'd rather not start printing every implicit&
symbol so I'm going to try forcing thea
decorators to come at the end of the SMIRKS.