MobleyLab / chemper

Repository for Chemical Perception Sampling Tools
MIT License
19 stars 10 forks source link

Add more efficient decorator removal in the smirksifier #40

Closed bannanc closed 5 years ago

bannanc commented 5 years ago
  1. Can we make better/more efficient changes when removing SMIRKS decorators?

In the example above, there is no reason why the final SMIRKS could not be

zz_sing              | [#7,#8,#6:1]-[#1,#6:2] 
--------------------------------------------------------------------------------
zz_aromatic          | [*:1]:;@[*:2] 
--------------------------------------------------------------------------------
zz_double            | [*:2]=[*:1]

However, the single bonds are left at "[#8,#7,#6:1]-[#1,#6:2]" because only one decorator is allowed to be removed at once. For example, a move getting rid of #8 would be allowed, but it would leave you with "[#7,#6:1]-[#1,#6:2]" which would no longer match the single bonds including oxygen.

There are "faster" ways to test these strategies, allow for random moves that remove all decorators at once and replace with a * for example.

Another option here would be to reduce the number of steps by only trying to remove decorators from atoms/bonds that actually currently have decorators. So a common rejected move above is something where the atom [*:1] is chosen and then the move is rejected because there are no decorators. It isn't obvious if it is better to "waste" an iteration in not changing the SMIRKS or if it is better to use computer time to check if the atom is a "valid" option. It seems unlikely that either choice would make a significant difference in computer time with a few molecules, but when you are typing hundreds or more molecules then the time for checking the SMIRKS will increase significantly so decreasing the number of possible iterations seems like a promising step.

I see three different types of moves that should be added to this point:

  1. Remove whole non-indexed atoms
  2. Remove all decorators of a given type (i.e. not just 1 AND decorator, but all AND decorators)
  3. Remove "empty" atoms/bonds from consideration when choosing what to change.
bannanc commented 5 years ago

merged in PR #45