choderalab / pymbar

Python implementation of the multistate Bennett acceptance ratio (MBAR)
http://pymbar.readthedocs.io
MIT License
235 stars 91 forks source link

The discrepancy in calculating u_kln between parallel-tempering-2dpmf.py and umbrella-sampling.py #479

Open Vinvin-Zhang opened 1 year ago

Vinvin-Zhang commented 1 year ago

After reading the examples for calculating the PMF based on data by parallel-templering and umbrella-sampling simulations, i found that there is a discrepancy in calculating u_kln(reduced potential energy).

Following the line 222 to 226 in (https://github.com/mrshirts/pymbar/blob/master/examples/parallel-tempering-2dpmf/parallel-tempering-2dpmf.py), reduced potential energy( u_kln[k,l,n] ) of trajectory segment n of temperature k evaluated at temperature l was calculated by following:.

    print("Computing reduced potential energies...")
    u_kln = numpy.zeros([K,K,N_max]) # u_kln[k,l,n] is reduced potential energy of trajectory segment n of temperature k evaluated at temperature l
    for k in range(K):
       for l in range(K):
          u_kln[k,l,0:N_k[k]] = beta_k[l] * U_kn[k,0:N_k[k]]

which means that u_kln[k,l,n] = beta_k[l] U_kn[k,n] = beta_k[l] U_kn[k,l,n] ( without bias, U_kn[k,n] = U_kn[k,1,n] = U_kn[k,2,n] = ... = U_kn[k,l,n] = .... ). [equation 1]

While in the examples of analyzing the data by umbrella sampling, as shown in the line 141 to 143 of https://github.com/mrshirts/pymbar/blob/master/examples/umbrella-sampling-pmf/umbrella-sampling.py, it is calculated by:

    # Compute energy of snapshot n from simulation k in umbrella potential l
    u_kln[k,:,n] = u_kn[k,n] + beta_k[k] * (K_k/2.0) * dchi**2

where

    u_kn[k,n] = beta_k[k] * (float(tokens[2]) - float(tokens[1])) # reduced potential energy without umbrella restraint(line 89)

which indicates: u_kln[k,l,n] = beta_k[k] U0_kn[k,n] + beta_k[k] (K_k[l] / 2.0) * dchi[l]2 = betak[k_] ( U0_kn[k,n] + (K_k[l] / 2.0) \ dchi[l]2 ) = betak[k_] * U_kn[k,l,n] [ equation 2]

It seems that the [equation 1] is the correct expression, while the [ equation 2 ] is not.

Generally, based on [ euqation 2 ], for umbrella sampling of constanst temperature(beta_k[k] = beta_k[l]), it is ok, while for different temperature, it will lead to incorrect result.

Here are my suggestions: 1)In line 89 of https://github.com/mrshirts/pymbar/blob/master/examples/umbrella-sampling-pmf/umbrella-sampling.py,

  u_kn[k,n] = beta_k[k] * (float(tokens[2]) - float(tokens[1])) # reduced potential energy without umbrella restraint

should be changed to :

  U_kn[k,n] = float(tokens[2]) - float(tokens[1])  # potential energy without umbrella restraint

2)In line 143,

  u_kln[k,:,n] = u_kn[k,n] + beta_k[k] * (K_k/2.0) * dchi**2 

should be changed to:

  u_kln[k,:,n] = beta_k * ( U_kn[k,n] + (K_k/2.0) * dchi**2 )
mrshirts commented 1 year ago

Thanks for noticing the problem!

Note that the main repository for the release is in choderalab/pymbar not mrshirts/pymbar. However, the same issue is present there. My personal repository is not necessarily in sync with the main repository!

We never noticed the problem since the provided data is only at 300K, and thus the erroneous code is never actually run. But it certainly would be confusing to users.

I think your solution looks correct, as the beta_k is then indexed by l, so then visits all of the other code. I will have to grab a few moments to look over carefully to see the best way to put it in the code (especially since we don't have actual data to check).