choderalab / TrustButVerify

Unit Tests for Protein Force Fields
GNU General Public License v2.0
1 stars 3 forks source link

Liquid mixture discussion: Simulation procedures #7

Open jchodera opened 10 years ago

jchodera commented 10 years ago

It would be useful to collect some discussion about the appropriate simulation procedures for the liquid mixtures project here and then do some tweaking of the code afterwards.

I'm looking through the current production code in use by @juliebehr: https://github.com/juliebehr/TrustButVerify/blob/liquids/trustbutverify/mixture_system.py

The current simulation protocol uses the following definitions:

N_STEPS_MIXTURES = 25000000 # 50 ns
N_EQUIL_STEPS_MIXTURES = 5000000 # 5ns
OUTPUT_FREQUENCY_MIXTURES = 499 #500

I'll break my comments into sections.

Total simulation length. A total simulation length of 50 ns may be overkill, but it really depends on the correlation time for the box volume and the standard deviation of the instantaneous density. This is something we can get a feel for later on.

Equilibration length. We should never, never use a fixed equilibration time and assume this is sufficient for all cases. This is a dangerous practice, since it could be that some systems relax more quickly than others, or some preparation steps require longer times to relax to the typical equilibrium state than others. A better approach would be to simply store all the data and examine it after the fact to automatically detect the region that we discard to "equilibration". We should use my detectEquilibration() function from the pymbar.timeseries module to do this: https://github.com/choderalab/pymbar/blob/master/pymbar/timeseries.py#L724 I hate to per persnickety about this, but if we deviate from this recommendation, it essentially undermines my campaign to get other important simulation labs to embrace "best practices" behaviors here. I know this scheme isn't yet published, but we can put out a quick paper describing it over the next week or two.

Output frequency. Shouldn't the output frequency be 500, rather than 499, for a 1 ps output interval? Also, we really should have different output frequencies for different types of data:

There are other issues we may need to look into eventually:

kyleabeauchamp commented 10 years ago

So I agree that we want to avoid using the fixed equilibration length, but to really make your scheme widely used we need embed it in the simulation software. It really needs to be fully automated, otherwise people won't bother. It doesn't have to be hard: we could probably implement it in OpenMM via some sort of EquilibrationReporter object that accumulates various 1D observables and runs equilibration detection on them. We could include this reporter in PyMBAR and have it conda-installable, so it wouldn't have to be burdensome for people to use, at least in OpenMM.

kyleabeauchamp commented 10 years ago

Regarding the output frequency, we have already done the statistical inefficiency calculation for a few systems. A reasonable choice of uncorrelated samples would be ~3 ps--10 ps is probably wasting cycles. Obviously this was a preliminary calculation that we did long ago and will need to be revisited.

kyleabeauchamp commented 10 years ago

To clarify, my 3 ps guess is based on the inefficiency of the density, not the potential, so that will need to be repeated as well. I think we might as well go to 5 ps or so for now to avoid clogging our disk space.

kyleabeauchamp commented 10 years ago

Regarding the length of the production simulations: if we implement the automated equilibration detection, we should also have a crude estimate of how long we need our production runs to be, right?

kyleabeauchamp commented 10 years ago

Why do we need a density reporter if we already have a StateReporter that can be used to calculate the density?

kyleabeauchamp commented 10 years ago

Regarding precision modes: the default is Mixed, correct? IMHO we should not even be considering single, right?

kyleabeauchamp commented 10 years ago

I think it makes sense to triage this list in order of how quickly each task can be implemented. Below is my interpretation:

  1. Change output frequency of coordinates to, say, 5 ps (a compromise)
  2. Benchmark the speed of the density reporter to optimize the density output frequency without hurting performance. (transferring the "state" from GPU can be slow)
  3. Evaluate PME settings. Via reweighting?
  4. Evaluate precision modes. I think
  5. Implement a Reporter based equilibration detection module for OpenMM?
jchodera commented 10 years ago

So I agree that we want to avoid using the fixed equilibration length, but to really make your scheme widely used we need embed it in the simulation software. It really needs to be fully automated, otherwise people won't bother. It doesn't have to be hard: we could probably implement it in OpenMM via some sort of EquilibrationReporter object that accumulates various 1D observables and runs equilibration detection on them. We could include this reporter in PyMBAR and have it conda-installable, so it wouldn't have to be burdensome for people to use, at least in OpenMM.

This isn't quite appropriate or practical since the equilibration detection is a purely post-processing method. If we're interested in the density, we simply store the entire density timeseries during the simulation at some convenient sampling interval and then use the pymbar.timeseries.detectEquilibration() method to extract the quantity of data to discard to equilibration and statistical inefficiency. Usage is like this:

from pymbar import timeseries
# Detect equilibrated region and compute statistical inefficiency of equilibrated region.
[t0, g, Neff] = timeseries.detectEquilibration(density_timeseries)
# Discard initial data to equilibration.
density_timeseries = density_timeseries[t0:]
# Compute expectation and statistical error estimate.
density_mean = density_timeseries.mean()
density_mean_stderr = density_timeseries.std() / numpy.sqrt(Neff)

We are already conda-packaging pymbar and including it as part of Omnia. Unless we also construct an automated analysis facility for the OpenMM app, there's no tighter way to integrate it, other than perhaps to create a single function that implements the small code block above.

Separately, I am lobbying people like Schrödinger to do this automatically.

And writing a short 2-4 page paper soon would be very useful as well.

jchodera commented 10 years ago

Regarding the output frequency, we have already done the statistical inefficiency calculation for a few systems. A reasonable choice of uncorrelated samples would be ~3 ps--10 ps is probably wasting cycles. Obviously this was a preliminary calculation that we did long ago and will need to be revisited.

The proper way to do this would be to use a frequent sampling interval for densities (e.g. 0.1 ps) since the storage requirements are minimal and then compute the correlation times and statistical inefficiencies for this property for all mixtures in our set when reporting them in the first paper.

We don't even really need to record positions for this first paper characterizing deficiencies in densities of mixture properties, but storing some (e.g. 10 ps intervals) could be useful for doing preliminary work for a second paper focusing on using reweighting to do optimize parameters in a Bayesian framework. But we don't need to generate TB of data at this stage unnecessarily.

jchodera commented 10 years ago

Regarding precision modes: the default is Mixed, correct? IMHO we should not even be considering single, right?

I'm not certain. We should check this. I'm not convinced this works properly on all hardware, however, and I have found the precision improvement provided to be unexpectedly poor or perhaps even broken. See, for example: https://github.com/SimTk/openmm/issues/441

jchodera commented 10 years ago

Why do we need a density reporter if we already have a StateReporter that can be used to calculate the density?

Which reporter is this? Can you point me to the code?

kyleabeauchamp commented 10 years ago

https://github.com/SimTk/openmm/blob/master/wrappers/python/simtk/openmm/app/statedatareporter.py#L71

jchodera commented 10 years ago

Regarding the length of the production simulations: if we implement the automated equilibration detection, we should also have a crude estimate of how long we need our production runs to be, right?

Not quite. It's more complex than this.

Our simulation needs to be T_equil + T_prod, where T_equil is long enough to reach the "typical equilibrium region" and T_prod depends on the statistical inefficiency once in the equilibrium region, the variance of the property of interest (in this case the instantaneous density), and the desired target statistical error for the estimate.

What are we aiming for in terms of target statistical error? Since densities are ~1 g/cm3, I imagine we want better than 0.1% error, perhaps 0.01% error (0.0001 g/cm3)?

kyleabeauchamp commented 10 years ago

I think we can keep a loose target (0.3%?) until we fix all the other bugs in our pipeline. I imagine we want to do things iteratively: run the entire benchmark, look for bugs, fix bugs, lower our tolerance, repeat...

jchodera commented 10 years ago

To clarify, my 3 ps guess is based on the inefficiency of the density, not the potential, so that will need to be repeated as well. I think we might as well go to 5 ps or so for now to avoid clogging our disk space

Note that if we're computing average densities, the statistical inefficiency of the density, not the potential energy, is the relevant quantity.

Since the instantaneous density is just a single number per snapshot, there is no reason to not write it out frequently (eg every 0.1 or 0.25 ps) if doing so does not slow down the simulation.

kyleabeauchamp commented 10 years ago

Right, but if we're seeking to do reweighting based on the potential energy (or its components), then those would be relevant as well. Regardless that's a future battle so we can ignore for now.

jchodera commented 10 years ago

I think it makes sense to triage this list in order of how quickly each task can be implemented.

Here's what I think we should do:

Simple, quick changes:

from pymbar import timeseries
# Detect equilibrated region and compute statistical inefficiency of equilibrated region.
[t0, g, Neff] = timeseries.detectEquilibration(density_timeseries)
# Discard initial data to equilibration.
density_timeseries = density_timeseries[t0:]
# Compute expectation and statistical error estimate.
density_mean = density_timeseries.mean()
density_stddev = density_timeseries.std()
density_mean_stderr = density_timeseries.std() / numpy.sqrt(Neff)

After this, but before first paper:

Longer term:

jchodera commented 10 years ago

I think we can keep a loose target (0.3%?) until we fix all the other bugs in our pipeline. I imagine we want to do things iteratively: run the entire benchmark, look for bugs, fix bugs, lower our tolerance, repeat...

The DM40 density meter has an accuracy of 0.0001 g/cm3. I imagine we'd want to take a small but diverse set and see what it takes to get comparable statistical errors. From the data we collect on equilibration times, statistical inefficiencies, and standard deviations of the instantaneous density, we will have a good idea of the simulation lengths required. Then we can scale up.

Without data on these properties (equilibration times, statistical inefficiencies, and standard deviations of the instantaneous density), we have no way of estimating how long we need to run. So let's get this info first.

jchodera commented 10 years ago

For reference, here's how we would estimate required simulation times:

T = Tequil + Tprod
Tequil is the expected equilibration time required
Tprod = stddev * g / stderr

where stddev is the standard deviation of the property being averaged (instantaneous density) and stderr is the desired statistical precision. g is the statistical inefficiency.