MBAR gives a statistical error magnitudes larger than the bootstrap error.

xiki-tempula commented 10 months ago

The statistical error is 40.48 while the bootstrap error is 0.35 The code to reproduce

import pymbar
import numpy as np

u_nk = np.load("u_nk.npy")
N_k = [10000] * 12

mbar = pymbar.MBAR(
    u_nk,
    N_k,
    maximum_iterations=10000,
    relative_tolerance=1e-07,
    verbose=True,
    initial_f_k=None,
    solver_protocol="robust",
    n_bootstraps=0,
)
out = mbar.compute_free_energy_differences(return_theta=True, uncertainty_method=None)

print(out["dDelta_f"][0][-1])
mbar = pymbar.MBAR(
    u_nk,
    N_k,
    maximum_iterations=10000,
    relative_tolerance=1e-07,
    verbose=True,
    initial_f_k=None,
    solver_protocol="robust",
    n_bootstraps=50,
)
out = mbar.compute_free_energy_differences(
    return_theta=True, uncertainty_method="bootstrap"
)
print(out["dDelta_f"][0][-1])

u_nk.npy.zip

mikemhenry commented 10 months ago

Have you had a chance to check if you see the same behavior with pymbar 3?

mrshirts commented 10 months ago

So this is a fascinating case. I am not sure that it's necessarily wrong. Interestingly, the first state has nearly, but not exactly 0 overlap with the other 11 states (see mbar.compute_overlap()['matrix']. So the analytical estimate is definitely off for that state. However, bootstrapping has very little variation.

One weird thing is the u_nk matrix. After sample 11, only state 0 has any energy - the rest are zeros. This seems odd, and likely unintended? Also in sample 0, both state 0 and 11 have zero energy, which seem weird, since state 0 has large energy for the rest of the run.

xiki-tempula commented 10 months ago

@mrshirts Thanks for taking a look. When you say that

After sample 11, only state 0 has any energy - the rest are zeros.

Which row or column do you mean?

mrshirts commented 10 months ago

1) sorry for being so unclear! 2) 1 gave some numbers from a case where I was manipulating the data to test a hypothesis.

The first paragraph above is completely correct. I.e. the first state has very little overlap with the other states, so the analytical estimate is going to be off. But, suprisingly, that doesn't rule out the bootstrap being correct in some situations. The interesting thing is that the covariance matrixes from the analytical case and the bootstrap are very close to each other (within ~10%, about what you would expect between variance methods), except for the variances from state 0 to the other states (which are very different). Note also that the bootstrap estimate of variance to state 0 from the other states is much larger than the estimate of variance between the other 11 states, but 1 order of magnitude different instead of 3 orders of magnitude different.

The 2nd paragraph was wrong where I was trying to test a hypothesis from the first observation (basically, trying to change the reference state), but was failing to construct the correct u_kn matrix, so that stuff is wrong. Apologies for that.

The hypothesis was "is the bootstrap variation low because I used the poorly overlapping state as the reference state, and would the variances be larger if I relabeled the states without changing the physics. When I successfully switched the 0 and 11 state labels, then I found that the uncertainties were essentially preserved (huge analytical difference from state 11, uncertainty to state 11 by bootstrap like 0.3), meaning that the low variance from bootstrap is NOT an artifact of the state 0.

SO, conclusion - I think this is a case where bootstrap is right (or much closer to right) and analytical is wrong, but if you want to get better results and also uncertainties agree, it is better to run simulations where there is good overlap between all states.

choderalab / pymbar

MBAR gives a statistical error magnitudes larger than the bootstrap error. #519