Closed jaclark5 closed 2 months ago
The derivation in this paper is not really correct, or rather, it has a chunk of extra stuff whose expectation is zero and adds noise to the calculation (I talked to them a while back about this).
Are there reasons the existing MBAR tools for entropy and enthalpy aren't useful/working? Or just not exposed correctly?
(i.e. MBAR with 2 states gives the same expectations as BAR, you use the 2 state MBAR entropy/enthalpy calculations).
@mrshirts It's exposed fine, but I have a use case where the free estimate from MBAR isn't aligning with BAR and TI estimates which are in close agreement to each other. I then decided to stick with BAR but still want the enthalpy/entropy contributions.
That's a good point about the 2-state MBAR, I'll check that out.
This PR also contains some small changes to improve documentation and removing someone's .vscode settings. I can close this PR and open another one with those.
Alternatively, if the derivation is somewhat correct, can we adapt it to be completely correct?
but I have a use case where the free estimate from MBAR isn't aligning with BAR and TI estimates which are in close agreement to each other. I then decided to stick with BAR but still want the enthalpy/entropy contributions.
BAR and MBAR not agreeing in value would generally indicate there is either something suspicious with the data or the solver isn't working correctly. I would be happy to take a look if that's the case!
Also BAR and MBAR have DIFFERENT error estimates (i.e. if you do 2-state MBAR and BAR you should get the SAME number, but DIFFERENT error estimates, especially for poor overlap), and that is something that I have been working to address. Bootstrap errors will be the same (and somewhere in between the analytical error estimates)
Alternatively, if the derivation is somewhat correct, can we adapt it to be completely correct?
IIRC (it's been quite a while since I worked through it), it has a term whose expectation is zero - if you leave that term out, it's the same as the MBAR 2-state entropy/enthalpy estimate.
This PR also contains some small changes to improve documentation and removing someone's .vscode settings. I can close this PR and open another one with those.
That would be great!
IIRC (it's been quite a while since I worked through it), it has a term whose expectation is zero - if you leave that term out, it's the same as the MBAR 2-state entropy/enthalpy estimate.
That does sound like a good check. Makes me wonder if this is something that we are testing in our CI or unit tests. If not, maybe that's also worth adding to the tests.
@mrshirts @ijpulidos I implemented the option to output the enthalpy/entropy for MBAR and BAR (via 2-state MBAR) in a PR to alchemlyb.
From the solvated benzene example in the tests, the free energies for MBAR and stitched 2-state MBAR are close: MBAR: [0.0, 1.619069, 2.557990, 2.986302, 3.041156] "BAR": [0.0, 1.609778, 2.547866, 2.984183, 3.044385]
The entropies and enthalpies are less satisfying. MBAR U: [0.0, 1.241970, 1.695000, 1.706555, 1.388906] "BAR" U: [0.0, 1.337356, 2.140400, 2.512384, 2.563935] MBAR S: [0.0, -0.377099, -0.862990, -1.279746, -1.652250] "BAR" S: [0.0, -0.272422, -0.407467, -0.471798, -0.480450]
Entropy and enthalpy are always statistically far more noisy than free energy. I wrote about 60% of paper on this a few years ago that never got finished. Did you calculate the uncertainty for these quantities, preferably by bootstrap?
@mrshirts The previous numbers were cumulative across states. Here is the output for adjacent state values and uncertainties (default without bootstrapping), they're pretty comparable, where "BAR" is a series of 2-state MBAR computations (I haven't coded up the error propagation for BAR with bootstrapping, I would love if someone has that): MBAR F: [-1.619069, -0.938921, -0.428311, -0.054854] [0.008802, 0.006642, 0.005362, 0.005133] "BAR" F: [-1.609778, -0.938088, -0.436317, -0.060202] [0.009879, 0.008740, 0.007372, 0.006381] MBAR U: [-1.241970, -0.453030, -0.011556, 0.317650] [0.01160, 0.008035, 0.005988, 0.005625] "BAR" U: [-1.337356, -0.803044, -0.371984, -0.051550] [0.011062, 0.008801, 0.007036, 0.006106] MBAR S: [0.377099, 0.485891, 0.416756, 0.372504] [0.012242, 0.009487, 0.007131, 0.005362] "BAR" S: [0.272422, 0.135045, 0.064332, 0.008652] [0.008932, 0.006298, 0.004190, 0.002941]
Here are the values and uncertainties with n_bootstrapping = 100: MBAR F: [-1.619069, -0.938921, -0.428311, -0.054854] [0.008993, 0.00693 , 0.005482, 0.004979] "BAR" F: [-1.609778, -0.938088, -0.436317, -0.060202] [0.010079, 0.007803, 0.007863, 0.006922] MBAR U: [-1.24197 , -0.45303 , -0.011556, 0.31765 ] [0.011248, 0.006933, 0.005399, 0.00513 ] "BAR" U: [-1.337356, -0.803044, -0.371984, -0.051550] [0.011885, 0.009307, 0.008083, 0.006947] MBAR S: [0.377099, 0.485891, 0.416756, 0.372504] [0.011365, 0.007181, 0.005037, 0.003795] "BAR" S: [0.272422, 0.135045, 0.064332, 0.008652] [0.008418, 0.005638, 0.003662, 0.002183]
Entropy and enthalpy are always statistically far more noisy than free energy. I wrote about 60% of paper on this a few years ago that never got finished. Did you calculate the uncertainty for these quantities, preferably by bootstrap?
The paper referenced in this PR covers that ;)
(I haven't coded up the error propagation for BAR with bootstrapping
You should not need to do error propagation for BAR with bootstrapping. You calculate the values (given the ENTIRE PROCESS of adding up N-1 BAR estimates) with N bootstrap samples. One of the biggest advantages of bootstrapping is avoiding error propagation.
Something is up with those uncertainties. They should be larger than the free energy uncertainties. There are correlations between the U and the S that should be showing up.
Free energies are all behaving really nicely (within uncertainties between BAR and MBAR), U and S are not.
I probably won't have time to look at it today, maybe Friday or the weekend.
You should not need to do error propagation for BAR with bootstrapping. You calculate the values (given the ENTIRE PROCESS of adding up N-1 BAR estimates) with N bootstrap samples. One of the biggest advantages of bootstrapping is avoiding error propagation.
I meant that I don't have the code to populate the cumulative FE matrix in alchemlyb, which is why I put the adjacent states.
One can reproduce these numbers using my branch for this alchemlyb PR and the following code:
import numpy as np
import alchemlyb
from alchemlyb.parsing import gmx
from alchemlyb.estimators import MBAR, BAR
from alchemtest.gmx import load_benzene
n_bootstrap = 100
dataset = load_benzene()
u_nk = alchemlyb.concat([gmx.extract_u_nk(file, T=300) for file in dataset["data"]["Coulomb"]])
mbar = MBAR(n_bootstraps=n_bootstrap).fit(u_nk, compute_entropy_enthalpy=True)
lx = len(mbar.delta_f_) - 1
print("MBAR F: {} {}".format(
np.round([mbar.delta_f_.to_numpy()[i + 1, i] for i in range(lx)], decimals=6),
np.round([mbar.d_delta_f_.to_numpy()[i + 1, i] for i in range(lx)], decimals=6),
))
print("MBAR U: {} {}".format(
np.round([mbar.delta_h_.to_numpy()[i + 1, i] for i in range(lx)], decimals=6),
np.round([mbar.d_delta_h_.to_numpy()[i + 1, i] for i in range(lx)], decimals=6),
))
print("MBAR S: {} {}".format(
np.round([mbar.delta_s_.to_numpy()[i + 1, i] for i in range(lx)], decimals=6),
np.round([mbar.d_delta_s_.to_numpy()[i + 1, i] for i in range(lx)], decimals=6),
))
bar = BAR(n_bootstraps=n_bootstrap).fit(u_nk, compute_entropy_enthalpy=True, use_mbar=True)
print("BAR F: {} {}".format(
np.round([bar.delta_f_.to_numpy()[i + 1, i] for i in range(lx)], decimals=6),
np.round([bar.d_delta_f_.to_numpy()[i + 1, i] for i in range(lx)], decimals=6),
))
print("BAR U: {} {}".format(
np.round([bar.delta_h_.to_numpy()[i + 1, i] for i in range(lx)], decimals=6),
np.round([bar.d_delta_h_.to_numpy()[i + 1, i] for i in range(lx)], decimals=6),
))
print("BAR S: {} {}".format(
np.round([bar.delta_s_.to_numpy()[i + 1, i] for i in range(lx)], decimals=6),
np.round([bar.d_delta_s_.to_numpy()[i + 1, i] for i in range(lx)], decimals=6),
))
This PR adds a function to calculate the enthalpic and entropic contributions of the free energy from BAR according to a previously published derivation.