choderalab / pymbar

Python implementation of the multistate Bennett acceptance ratio (MBAR)
http://pymbar.readthedocs.io
MIT License
235 stars 91 forks source link

Can I use PyMBAR on NAMD accelerated MD simulations to generate a 2D PMF? #224

Open kkastner opened 8 years ago

kkastner commented 8 years ago

I wanted to know if it is possible to use PyMBAR to generate a 2D Potential of Mean Force (PMF) graph using energies obtained from NAMD. The energies need to be reweighted though as I used accelerated MD (aMD). I wanted to combine the PMFs of two simulations containing the same protein but different ligands, and they are both huge simulations (~100,000 atoms each). Is it possible to do all of this using PyMBAR?

mrshirts commented 8 years ago

Great question. I'm not 100% sure exactly the calculation, but let me try to follow up to reach a good point.

Generally, you have a set of Hamiltonians of interest. There are K_sampled that you actually have collected data from, and K_unsampled additional states you did not collect samples at, but that you would like to calculate thermodynamic properties at. For example, if you are doing a PMF, the K sampled states would be the all of the harmonic restraint hamiltonians, and the unsampled state would be state without weights. You then need to be able to calculate the difference in energy of each point from the sampled state between the sampled state and the unsampled state. With PMF's this is pretty easy, since you just need to know the distance difference that is used in the harmonic oscillator,

A SECOND step would be to calculate the probability of the ligand being in each grid square in the unsampled state; essentially, you have a box function (one inside the box, zero outside) and you want to know the expectation value of that box function.

Note that the grid over which you are calculating the PMF does not have to correspond to the grid that the harmonic restraints are on. You COULD approximate the PMF by just using the free energies of the harmonic restraints, but there are some approximations associated with that choice.

We have an example of how to do this in pymbar/examples/parallel-tempering-2dpmf/.

I'm not sure exactly what is meant by "combining the two PMF's" of different ligands. Can you write out the mathematical definition of combining?

kkastner commented 8 years ago

Thank you for responding! I do not think I can apply that method to my simulations. I have two simulations with trajectories generated and energies calculated every so often. I am interested in all of the trajectories, as I want to use them to generate my PMF, and thus I would have no K_unsampled states to compare against. Also, the ligand in each simulation that I am talking about are always (approximately) in the same position relative to the protein that I am simulating, so the probability of the ligand being in the grid square would always be 1. By "combining the two PMFs", I am talking about combing the PMFs generated from each simulation together into a single PMF chart. I am attaching the type of PMF chart that I want to generate (source paper doi:10.1073/pnas.1309755110). amd_paper_pmf_chart

tjgiese commented 6 years ago

Prof. Shirts, "Note that the grid over which you are calculating the PMF does not have to correspond to the grid that the harmonic restraints are on. You COULD approximate the PMF by just using the free energies of the harmonic restraints, but there are some approximations associated with that choice."

Could you please elaborate a little on what approximations you are talking about?

I recently started using pymbar to create 2D PMFs using the approach you've just described, which sounds the same (to my ears) as the quote (below eq 12 in ref [1]): "Free energy differences and uncertainties between states not sampled are easily estimated by augmenting the set of states with additional reduced potentials ui(x) with the number of samples Ni=0".

However, in this context, this approach really only makes sense if all of the umbrella window simulations used the SAME force constants (and temperatures, etc) because, if they didn't, then one would have to make an "artistic" decision for the unsampled state force constants (and temperatures, etc.). Are these the sort of approximations you are talking about or is it something more fundamental with the maths?

Actually, what I really want to know is: if I use pymbar to generate PMFs by evaluating the free energies of the harmonic restraints, am I doing something fundamentally wrong? In other words, if you were to review a manuscript and saw this, what would your comment be?

Thanks, Tim

[1] Shirts MR and Chodera JD. Statistically optimal analysis of samples from multiple equilibrium states. J. Chem. Phys. 129:124105, 2008

tjgiese commented 6 years ago

OK. I looked a bit into it. Apparently, MBAR needs to know: 1. the energy of all states at all samples and 2. the fraction of total samples that each state contributes. Item 2 appears to the key point here. If a state contributes no samples, then it does not enter the denominator of the MBAR equation. WHICH samples a state contributes doesn't matter, but HOW MANY samples a state contributes does matter.

For example, we get different free energy results with the same samples, but changing the association of which state was the contributor of that sample:

import pymbar (x_n, u_kn, N_k, s_n) = pymbar.testsystems.HarmonicOscillatorsTestCase().sample(mode='u_kn') mbar = pymbar.MBAR(u_kn, N_k) f,df,th = mbar.getFreeEnergyDifferences() print "original:",f[0,:] N_k[0] = sum(N_k) N_k[1:]= 0 mbar = pymbar.MBAR(u_kn, N_k) f,df,th = mbar.getFreeEnergyDifferences() print "modified:",f[0,:]

#

Example output:

#

original: [ 0. 0.45959842 0.78389085 1.10540918 2.03454974]

modified: [ 0. -0.27519535 -1.6420375 -3.86740059 -6.74713775]

#