essex-lab / grand

A Python module for carrying out GCMC insertions and deletions of water molecules in OpenMM.
https://grand.readthedocs.io
MIT License
55 stars 18 forks source link

Adams volume calibration #15

Open proteneer opened 1 year ago

proteneer commented 1 year ago

I've been having difficulty re-producing the density data in (https://pubs.acs.org/doi/10.1021/acs.jcim.0c00648) using the water scripts from this repo and I wanted to clarify the settings on the # of waters in states ${0,1,2}$, where $0$ - GCMC Ghost water, $1$ - GCMC Real Water, $2$ - Non-GCMC Real Water. Notably, when converged, I was seeing densities that are consistently higher than that was reported.

When reading up on the Adams parameter definition in the paper(s), I noticed it was defined as:

$B=\beta \mu'_{sol} + \ln \left( \dfrac{V_G}{V^0} \right)$ where $V_G$ is the GCMC volume. Intuitively, the ratio $\dfrac{V_G}{V^0}$ can be interpreted as the average number of GCMC Real waters, $\bar{N}$, also from the Adams paper. This is important, because the acceptance probabilities themselves for insertion and deletion define the $N$ and $\dfrac{1}{N+1}$ prefactors in terms of the number of GCMC Real waters (state=1), also shown in the code here, as opposed to the total number of waters.

However, in the example water scripts, there's usually a large number of non-GCMC waters. If non-GCMC waters are used, shouldn't the calibration of the Adams parameter then be offset by the number of non-GCMC-waters? ie:

$B=\beta \mu'_{sol} + \ln \left( \dfrac{V_G}{V^0} - N_W \right)$ where $N_W$ is the number of non-GCMC waters.

proteneer commented 1 year ago

After some more digging, it looks like for the water.py script, water is only assigned to states 0 or 1, code. The non-GCMC water state of 2 is only ever used in the sphere samplers. However, I think this introduces another problem in the spherical samplers, since the occupancy of the protein would also be at play, and I suspect a similar kind of correction is likely needed to account for the shifted occupancy.

olliemelling commented 1 year ago

Hi,

Will try and have a more in-depth look at this next week when I have some more time, and will run some density tests to confirm results.

Regarding the state assigned to water molecules - any coupled (i.e. real) water within the sphere is assigned state 1, so if a water that was not initially a GCMC water moves into the sphere during a simulation, it is assigned a state of 1. There should never be any real waters within the sphere whose state is not 1. A state of 1 essentially indicates that the water is a candidate for deletion, should a deletion move be attempted

This is why in the system samplers there are no waters with state = 2, as real waters from anywhere within the system can be deleted. I don't think the offset of $-N_W$ is therefore required as $N_W=0$ within the GCMC region (whether that be sphere or system).

proteneer commented 1 year ago

Thanks Ollie - the current state of the code is correct for the GCMC System Samplers, (i.e. no waters assigned to a state=2), but I'm a bit concerned about the case when non-GCMC molecules are present.

Suppose we were to add a non-GCMC aliphatic molecule, eg. CCCCCCC into a box of water. Do the NPT and muVT simulations still agree on the densities still (using the definition of the Adams parameter in the paper)? I can independently try to verify this as well.

olliemelling commented 1 year ago

Some further info, after having discussed this in more detail with the original developer of grand...

Regarding the occluded volume issue, the approximation that $Vg/V{std}$ can be interpreted as the average number of waters in the GCMC sphere is not technically correct, and is a mistake that has been made in a number of papers historically. It is more accurate to say that it's the average number of waters you'd expect in that volume of bulk water. Having said that, it shouldn't make much difference as B is logarithmically dependent on the average number of waters so you will likely end up close enough anyway.

We don't therefore need to account for volume occluded by protein etc. There is some empirical evidence of this in the SI of this paper where extending the GCMC box into the protein is shown to not affect the results.

To properly test on a solvated aliphatic molecule as you suggested will require very long and thorough sampling to ensure convergence.

proteneer commented 1 year ago

The density of bulk water is incredibly sensitive to choice of parameters (excess chemical potential and standard volume), and very slow to converge.

I don't think this is the issue here. We've re-calibrated the Adams parameters using our own forcefield (and choice of temperature/cutoff/etc.) very carefully, using ~30 windows with precision of about +- 0.005kJ/mol, and standard volumes of +- 0.00003 nm^3.

It is more accurate to say that it's the average number of waters you'd expect in that volume of bulk water.

This is what I meant as well by "average", i.e. relative to bulk. (Using the thermal de Broglie wavelength is in-appropriate, as it comes from the kinetic component of the grand partition function, and is also mass dependent)

We don't therefore need to account for volume occluded by protein etc. There is some empirical evidence of this in the SI of this paper where extending the GCMC box into the protein is shown to not affect the results.

Figure S.3 is showing results for the water binding free energy (and comparing the results computed via GCI against double decoupling). I'm not sure if this is directly translatable.

To properly test on a solvated aliphatic molecule as you suggested will require very long and thorough sampling to ensure convergence.

I don't see why this would take any more than the typical water validation test. The decorrelation times for a small aliphatic chain should be relatively fast. For the sake of argument, the aliphatic molecule can be replaced by a large, rigid structure (eg. cubane).

proteneer commented 1 year ago

Ultimately, suppose we run an NPT simulation with cubane (or some other rigid molecule), solvated in a 40x40x40=V_GCMC Angstrom box. We compute the average volume V_0=<V> and the average density p_0=<p>. Then we run a muVT GCMC simulation with V=V_0, with calibrated Adams parameters and the volume correction as described, do we expect to recover the same density p_0?

Note that the condition for equilibrium is when the internal chemical potential (i.e. u_internal=dG/dN) is equal to u_external. This can only be possible if the system has a higher density to account for the occluded volume.