Closed lohedges closed 1 month ago
I believe this is related to the change in LJ sigma values for perturbed atoms, i.e they are no longer zero for ghost atoms. This is probably leading to local spikes in potential energy for specific lambda values, which is causing issues for the minimiser. I see this on the main
branch of somd2
and have many issues re-running things that worked fine before, e.g. the hydration free-energy test set.
Yes, this is definitely related to the LJ sigma problem. For somd2
, I realised that we don't use dynamics.minimise()
, instead calling .minimisation()
on the system prior to creating the dynamics object. It looks like the code is old, and predates the setting of parameters such as shift_delta
and coulomb_power
, hence they take the default values for the minimisation. This means that the minimisation in somd2
seems to work a bit more reliably than calling dynamics.minimise()
directly, which was using the optimised somd1
settings, not the defaults. This is clearly a bug in somd2
, since it should use consistent settings, but highlights that it is indeed the LJ sigmas that are causing the minimiser to struggle.
This seems to mostly be resolved via #237. When I get a chance I'll take a closer look at the code to see if it's easy to add logic for a walltime, or similar.
In quite a few instances I am seeing hangs with the modified OpenMM minimiser. I've tried to mitigate this using the
max_iterations
kwargs, but the problem persists. It seems that the number of iterations might be set to zero for each ratchet. The available options are:If it's not possible to set things independently, i.e. an absolute number of iterations as the max, then maybe we also want to implement a timeout of some sort? I'll try to find a system that repeatedly hangs to post for debugging purposes. That's one of the other things that's a bit frustrating, since this doesn't happen all the time, e.g. you might see it for one window of one replica of an RBFE run, which causes the whole thing to fail.