Open jchodera opened 10 years ago
So I agree that we want to avoid using the fixed equilibration length, but to really make your scheme widely used we need embed it in the simulation software. It really needs to be fully automated, otherwise people won't bother. It doesn't have to be hard: we could probably implement it in OpenMM via some sort of EquilibrationReporter object that accumulates various 1D observables and runs equilibration detection on them. We could include this reporter in PyMBAR and have it conda-installable, so it wouldn't have to be burdensome for people to use, at least in OpenMM.
Regarding the output frequency, we have already done the statistical inefficiency calculation for a few systems. A reasonable choice of uncorrelated samples would be ~3 ps--10 ps is probably wasting cycles. Obviously this was a preliminary calculation that we did long ago and will need to be revisited.
To clarify, my 3 ps guess is based on the inefficiency of the density, not the potential, so that will need to be repeated as well. I think we might as well go to 5 ps or so for now to avoid clogging our disk space.
Regarding the length of the production simulations: if we implement the automated equilibration detection, we should also have a crude estimate of how long we need our production runs to be, right?
Why do we need a density reporter if we already have a StateReporter that can be used to calculate the density?
Regarding precision modes: the default is Mixed, correct? IMHO we should not even be considering single, right?
I think it makes sense to triage this list in order of how quickly each task can be implemented. Below is my interpretation:
So I agree that we want to avoid using the fixed equilibration length, but to really make your scheme widely used we need embed it in the simulation software. It really needs to be fully automated, otherwise people won't bother. It doesn't have to be hard: we could probably implement it in OpenMM via some sort of EquilibrationReporter object that accumulates various 1D observables and runs equilibration detection on them. We could include this reporter in PyMBAR and have it conda-installable, so it wouldn't have to be burdensome for people to use, at least in OpenMM.
This isn't quite appropriate or practical since the equilibration detection is a purely post-processing method. If we're interested in the density, we simply store the entire density timeseries during the simulation at some convenient sampling interval and then use the pymbar.timeseries.detectEquilibration()
method to extract the quantity of data to discard to equilibration and statistical inefficiency. Usage is like this:
from pymbar import timeseries
# Detect equilibrated region and compute statistical inefficiency of equilibrated region.
[t0, g, Neff] = timeseries.detectEquilibration(density_timeseries)
# Discard initial data to equilibration.
density_timeseries = density_timeseries[t0:]
# Compute expectation and statistical error estimate.
density_mean = density_timeseries.mean()
density_mean_stderr = density_timeseries.std() / numpy.sqrt(Neff)
We are already conda-packaging pymbar
and including it as part of Omnia. Unless we also construct an automated analysis facility for the OpenMM app, there's no tighter way to integrate it, other than perhaps to create a single function that implements the small code block above.
Separately, I am lobbying people like Schrödinger to do this automatically.
And writing a short 2-4 page paper soon would be very useful as well.
Regarding the output frequency, we have already done the statistical inefficiency calculation for a few systems. A reasonable choice of uncorrelated samples would be ~3 ps--10 ps is probably wasting cycles. Obviously this was a preliminary calculation that we did long ago and will need to be revisited.
The proper way to do this would be to use a frequent sampling interval for densities (e.g. 0.1 ps) since the storage requirements are minimal and then compute the correlation times and statistical inefficiencies for this property for all mixtures in our set when reporting them in the first paper.
We don't even really need to record positions for this first paper characterizing deficiencies in densities of mixture properties, but storing some (e.g. 10 ps intervals) could be useful for doing preliminary work for a second paper focusing on using reweighting to do optimize parameters in a Bayesian framework. But we don't need to generate TB of data at this stage unnecessarily.
Regarding precision modes: the default is Mixed, correct? IMHO we should not even be considering single, right?
I'm not certain. We should check this. I'm not convinced this works properly on all hardware, however, and I have found the precision improvement provided to be unexpectedly poor or perhaps even broken. See, for example: https://github.com/SimTk/openmm/issues/441
Why do we need a density reporter if we already have a StateReporter that can be used to calculate the density?
Which reporter is this? Can you point me to the code?
Regarding the length of the production simulations: if we implement the automated equilibration detection, we should also have a crude estimate of how long we need our production runs to be, right?
Not quite. It's more complex than this.
Our simulation needs to be T_equil + T_prod, where T_equil is long enough to reach the "typical equilibrium region" and T_prod depends on the statistical inefficiency once in the equilibrium region, the variance of the property of interest (in this case the instantaneous density), and the desired target statistical error for the estimate.
What are we aiming for in terms of target statistical error? Since densities are ~1 g/cm3, I imagine we want better than 0.1% error, perhaps 0.01% error (0.0001 g/cm3)?
I think we can keep a loose target (0.3%?) until we fix all the other bugs in our pipeline. I imagine we want to do things iteratively: run the entire benchmark, look for bugs, fix bugs, lower our tolerance, repeat...
To clarify, my 3 ps guess is based on the inefficiency of the density, not the potential, so that will need to be repeated as well. I think we might as well go to 5 ps or so for now to avoid clogging our disk space
Note that if we're computing average densities, the statistical inefficiency of the density, not the potential energy, is the relevant quantity.
Since the instantaneous density is just a single number per snapshot, there is no reason to not write it out frequently (eg every 0.1 or 0.25 ps) if doing so does not slow down the simulation.
Right, but if we're seeking to do reweighting based on the potential energy (or its components), then those would be relevant as well. Regardless that's a future battle so we can ignore for now.
I think it makes sense to triage this list in order of how quickly each task can be implemented.
Here's what I think we should do:
Simple, quick changes:
detectEquilibration
to compute average densities, statistical errors, standard deviations, statistical inefficiencies, and equilibration times, and store these for each simulation:from pymbar import timeseries
# Detect equilibrated region and compute statistical inefficiency of equilibrated region.
[t0, g, Neff] = timeseries.detectEquilibration(density_timeseries)
# Discard initial data to equilibration.
density_timeseries = density_timeseries[t0:]
# Compute expectation and statistical error estimate.
density_mean = density_timeseries.mean()
density_stddev = density_timeseries.std()
density_mean_stderr = density_timeseries.std() / numpy.sqrt(Neff)
After this, but before first paper:
simtk.openmm.app.Simulation
and Reporter
classes (I'm not sure we do---I've always thought these were pretty useless), refactor our tools into a Reporter
for each physical property of interest so we can mix and match. For example, DensityReporter
, HeatOfVaporizationReporter
, etc.Longer term:
I think we can keep a loose target (0.3%?) until we fix all the other bugs in our pipeline. I imagine we want to do things iteratively: run the entire benchmark, look for bugs, fix bugs, lower our tolerance, repeat...
The DM40 density meter has an accuracy of 0.0001 g/cm3. I imagine we'd want to take a small but diverse set and see what it takes to get comparable statistical errors. From the data we collect on equilibration times, statistical inefficiencies, and standard deviations of the instantaneous density, we will have a good idea of the simulation lengths required. Then we can scale up.
Without data on these properties (equilibration times, statistical inefficiencies, and standard deviations of the instantaneous density), we have no way of estimating how long we need to run. So let's get this info first.
For reference, here's how we would estimate required simulation times:
T = Tequil + Tprod
Tequil is the expected equilibration time required
Tprod = stddev * g / stderr
where stddev
is the standard deviation of the property being averaged (instantaneous density) and stderr
is the desired statistical precision. g
is the statistical inefficiency.
It would be useful to collect some discussion about the appropriate simulation procedures for the liquid mixtures project here and then do some tweaking of the code afterwards.
I'm looking through the current production code in use by @juliebehr: https://github.com/juliebehr/TrustButVerify/blob/liquids/trustbutverify/mixture_system.py
The current simulation protocol uses the following definitions:
I'll break my comments into sections.
Total simulation length. A total simulation length of 50 ns may be overkill, but it really depends on the correlation time for the box volume and the standard deviation of the instantaneous density. This is something we can get a feel for later on.
Equilibration length. We should never, never use a fixed equilibration time and assume this is sufficient for all cases. This is a dangerous practice, since it could be that some systems relax more quickly than others, or some preparation steps require longer times to relax to the typical equilibrium state than others. A better approach would be to simply store all the data and examine it after the fact to automatically detect the region that we discard to "equilibration". We should use my
detectEquilibration()
function from thepymbar.timeseries
module to do this: https://github.com/choderalab/pymbar/blob/master/pymbar/timeseries.py#L724 I hate to per persnickety about this, but if we deviate from this recommendation, it essentially undermines my campaign to get other important simulation labs to embrace "best practices" behaviors here. I know this scheme isn't yet published, but we can put out a quick paper describing it over the next week or two.Output frequency. Shouldn't the output frequency be 500, rather than 499, for a 1 ps output interval? Also, we really should have different output frequencies for different types of data:
DensityReporter
that just computes the instantaneous density and writes it to a file at desired intervals.There are other issues we may need to look into eventually: