MobleyLab / chemper

Repository for Chemical Perception Sampling Tools
MIT License
19 stars 10 forks source link

Fix Decorator order to be consistently bug free #62

Closed bannanc closed 5 years ago

bannanc commented 5 years ago

tl;dr: A needs to be at the end of a list of decorators.

7ed87b12-0c63-43a0-a793-be9491de7379

This was a torsion in a subset of MiniDrugBank where chemper was failing to generate a set of SMIRKS that kept the clustering.

I was very confused because in that particular set, the SMIRK for that group of torsions was: '[#6H1X3ar6,#7AH0X2r5;+0;x2:1](-[#1!rX1x0,#16X2r5x2;+0;A;H0]):,=;@[#6Ar5,#6ar6;+0;H0;X3;x2:2](-,:;@[#6AH0r5,#6H1ar6;+0;X3;x2]-,:[#6!rAH0x0,#6H1ar6x2;+0;X3])-;!@[#8AH1X2x0!r+0:3]-;!@[#1AH0X1x0!r+0:4]'

Since I couldn't figure out by eye what wasn't working, I decided to try visualizing pieces of the SMIRKS to see what didn't work. I tried a lot of iterations that I won't share in detail here, but basically looked like individual pieces of the SMIRKS and different combinations of the decorators.

Finally, I found the issue was with atom 2 with SMIRKS '[#6Ar5,#6ar6;+0;H0;X3;x2:1]' highlighted here:

510ee18b-aa4c-4b67-a47b-0b24a104012a

Now, the part of the SMIRKS which should match that atoms is '[#6Ar5:1]', but it doesn't work, however if you switch the A and r5 decorators to make '[#6r5A:1]' it does work. I think the problem here is that the Ar is interpreted as argon instead of the implicit & between them. I'd rather not start printing every implicit & symbol so I'm going to try forcing the a decorators to come at the end of the SMIRKS.

bannanc commented 5 years ago

This is being fixed by PR #65