ADicksonLab / wepy

Weighted Ensemble simulation framework in Python
https://adicksonlab.github.io/wepy/index.html
MIT License
48 stars 20 forks source link

Try to detect dynamics errors post-segment #116

Open salotz opened 8 months ago

salotz commented 8 months ago

In some cases dynamics segments (with OpenMM) can return NaNs which then propagate throughout the other code until something hits it that can't handle NaNs which may be far down the chain from the dynamics itself.

This happens when you've interpreted the units for an input file incorrectly (angstroms in the file, assumed to be nanometers by OpenMM), or when a simulation "explodes" when you've constructed it in an "incorrect" but structurally valid way.

In any case it would be very useful to have some simple checks for NaNs at the end of each segment.

Its unclear what the action should be though. The obvious ones are:

The logging one is nice in case you have specifically expected this (not sure this is a real use case) or is otherwise not actually an issue. If you do get some blowup later you can check the log and see if NaNs were observed, and where/when.

The error is nice in that you know exactly where things went "wrong", however that makes it inflexible to get around if you really need to.

Its also unclear where this check should be done. Should the runner check? Should there be an optional filter post MD (or perhaps a post_cycle hook with custom logic or common options), etc.

At minimum we could start with the runner logging this kind of observation since that is non-invasive and would aid in debugging.

alexrd commented 8 months ago

In my experience OpenMM already checks and raises an error ("particle coordinate is NaN!"). But I think there could be some work to do in wepy for dealing with failed segments more gracefully.

salotz commented 8 months ago

Yes that is true, I remembered/realized after I posted this. I might have been getting NaNs somewhere else.