At the bottom of the function, the sampler states are saved to the checkpoint. However, even though the replica thermodynamic states are updated during equilibration, they are not saved into the checkpoint.
I would imagine we would want to keep the file consistent with the simulation, but I could be missing something. Is this expected/desired behavior?
I ask because I am running in unreliable environments, and am seeing lots of NaNs whenever a long-running simulation needs to restart.
This is a good question. @ijpulidos Do you know if there is a reason we do it this way? I am trying to think of a scenario where this behavior would be desirable.
https://github.com/choderalab/openmmtools/blob/9334bc9cf7d31be62926a3867503d6c65e0a305a/openmmtools/multistate/multistatesampler.py#L704C18-L704C47
At the bottom of the function, the sampler states are saved to the checkpoint. However, even though the replica thermodynamic states are updated during equilibration, they are not saved into the checkpoint.
I would imagine we would want to keep the file consistent with the simulation, but I could be missing something. Is this expected/desired behavior?
I ask because I am running in unreliable environments, and am seeing lots of NaNs whenever a long-running simulation needs to restart.