MobleyLab / chemper

Repository for Chemical Perception Sampling Tools
MIT License
19 stars 10 forks source link

Efficiency when searching for SMIRKS matches #12

Open bannanc opened 6 years ago

bannanc commented 6 years ago

This will be the first of a number of issues that I am migrating form the smarty repo. That is problems around chemical perception searching that we discussed, but chose not to directly address in that code. I'm not actually sure if this will end up being as big of a problem here as it was with smarty/smirky.

Essentially the looping through molecules and smirks patterns was the cause of smirky's slowness. Discussion available at smarty issue#261. Here is the original text:

I wanted to get an issue started for this that can be used for documentation. I continue to believe that it isn't worth investing significant time into speeding up SMIRKY now, but certain functions from this code will likely carry over to future move proposal engines.

This issue will focus on places that the code can be more efficient, not making "smarter" chemical moves, though that is also going to be important.

At the OFF meeting this week, Daniel Smith has been helping me diagnose what is causing the code to be slow. We've identified the get_typed_molecules as a particularly problematic part of the code, it scales as N^4.

bannanc commented 5 years ago

I did not bring this up in the preprint, but I think its something worth thinking about. We don't want making SMIRKS to take significant computing time, it should be the fast step. Its worth thinking about if there is a way to flatten the loops currently required for typing molecules.

This was an issue Daniel and I found really early on. However, I think its possible its a bigger openforcefield question. Basically, the problem is that looping over all molecules and all SMIRKS patterns doesn't scale well when you have hundreds of molecules. In testing ChemPer I found that running 5,000 Reducer steps for torsions on the 20 amino acid polypeptide required 5ish hours on my laptop.