choderalab / pymbar

Python implementation of the multistate Bennett acceptance ratio (MBAR)
http://pymbar.readthedocs.io
MIT License
226 stars 89 forks source link

Estimators for statisticalInefficiency #131

Closed rmcgibbo closed 9 years ago

rmcgibbo commented 9 years ago

Have you guys tried using other estimators for the statistical inefficiency / integrated autocorrelation time? I looked around a little bit -- there are a quite a few ways of doing it. There is some empirical comparison of various estimators on arXiv. It looks to me like the most widely used method is from the CODA (convergence diagnostics) package in R. There's an article about CODA here. The function effectiveSize is basically just len(x)/statisticalInefficiency(x).

The method in CODA is implemented by fitting an AR(p) model to the time series with method of moments (Yule-Walker), with p chosen by AIC, and then using the spectral density at 0.

kyleabeauchamp commented 9 years ago

I think we're interested in benchmarking this stuff and using the benchmark to improve the "standard workflow" in the MD and free energy communities. I personally haven't done that much.

kyleabeauchamp commented 9 years ago

Also, if we're doing equilibration detection we need estimate both the inefficiency and the burn in period. The simplest approach is a grid scan, but this does require a bit of speed in estimator for the inefficiency.

jchodera commented 9 years ago

As a graduate student, I spent a great deal of time investigating a number of different schemes for estimating statistical inefficiencies in collaboration with Bill Swope at IBM Almaden Research Center. The only published results of those investigations were summarized in Sections 2.4 and 5.2 of my parallel tempering WHAM paper, but I imagine there are a number of methods used by the statistical inference community we hadn't evaluated since our focus was on methods that had been employed by the Monte Carlo and molecular dynamics communities. The evaluation was also not systematic by any means, since we merely wanted to find a practical and robust method that was "good enough" for most purposes.

I'd love to see a systematic evaluation, in which we create a standard test suite of "typical" data (with enough data to break it into statistically uncorrelated pieces to perform rigorous tests of the reliability of predicted statistical uncertainties), and evaluate a set of standard methods against this test suite to compare reliability and speed of uncertainty estimates.

I'm still skeptical about modeling molecular simulation timeseries data as an autoregressive process because it assumes so much about the statistics of the underlying process that simply cannot be true. On the other hand, the asymptotic exponential behavior for the autocorrelation function made fewer assumptions about the statistics of the underlying process. But I'd love to read more about this, or hear about how well it does.

Perhaps we should split out the timeseries module into a new project focused just on correlation and uncertainty analysis of MCMC-like data?

rmcgibbo commented 9 years ago

Yeah, there are quite a few other estimators used in Madeleine Thompson's technical report, so this isn't the only ballgame in town. I guess my interest here is in convergence diagnostics MCMC, so this is right up the alley of what this community has worked on. Also coda is a very widely used R package.

rmcgibbo commented 9 years ago

I'll try out a few of the other estimators. Let me know if you have any good ideas for other benchmark datasets.

kyleabeauchamp commented 9 years ago

So I'm presently using the equilibration_detection stuff to run automatically discard the equilibration and then run neat liquid simulations to convergence (stderr of density ~ 2E-4 g / mL): https://github.com/choderalab/TrustButVerify/blob/master/trustbutverify/mixture_system.py#L171

Not sure if that it's a good benchmark, but it's a "real world" benchmark. Obviously I'm convolving the two problems of burn-in and statistical inefficiency, but IMHO they're often observed together in other MCMC applications as well.

rmcgibbo commented 9 years ago

Yeah, IMO they're probably the same in the end -- if you have a good estimator for the integrated autocorrelation time, I think you can figure out both the burn in and the error bars. The challenge with benchmarking here is sort of that one would like to at least use some benchmarks where the right answer is analytically known, but those tend to have quite simple autocorrelation structure.

rmcgibbo commented 9 years ago

Okay, I now have 5 different estimators implemented, including one that's equivalent to the one in pymbar.timeseries. The remaining 4 are tested against reference implementations in R. I'll merge the PR shortly (still tricky to get R packages running on Travis), and then find some data to play with and think about how to benchmark.

jchodera commented 9 years ago

I've finally had a chance to read the arXiv paper you linked to.

The method implemented in pymbar.timeseries.statisticalInefficiency() is very close to the initial positive sequence (IPS) estimator described in Section 2.3. We also truncate the sum when the autocorrelation becomes negative, though the fast=True scheme also uses a multiscale integration method for speed. As the arXiv paper demonstrates, this method performs well except in cases where the assumption that the autocorrelation is always positive is violated.

I'm surprised that the AR process method is so effective. The availability of confidence intervals that are sometimes reasonable could be useful, but they seem to be misleadingly small in several of the datasets resulting from actual MCMC simulations (Fig 4, ARMS-Bimodal and Stepout-Log-Var), suggesting they may be useless in practice. Despite this, the AR estimates (Fig 2) seem to be about as reliable as the ICS/IPS scheme, so there may be advantages if the AR process method is computationally faster then the ICS/IPS schemes.

Would be curious to hear what your investigations reveal!

rmcgibbo commented 9 years ago

Right. The Geyer IPS estimator is very close to what you've been doing. Practical, my expectation is that they yield basically the same thing, but the IPS estimator is provably consistent, which is nicer.

I'm surprised that the AR process method is so effective.

I think I can explain this, based on the structure of the Yule-Walker equations, but I don't have anything concrete. I've played around a little with the estimators that I've implemented, but so far only with toy datasets.