Closed orbeckst closed 1 year ago
For other convergence analysis see http://getyank.org/latest/algorithms.html
@orbeckst Thanks for this issue. I'm interested in this way of checking convergence as well. I wonder if you have any code snippet or library that we could use?
convergence.zip @xiki-tempula Here's the script we used in SAMPL6 and SAMPL7 challenges.
@VOD555 Thank you for the script. I have a question with regard to the paper https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8397498/#FD17
I wonder how do you compute the equation 17 and 18?
Is the Ac actually being used? Thank you.
Quick comment (@VOD555 should be able to provide more details):
def C_Rc_func_factory(Rc_lambdas):
def f(Rc):
return np.sum([np.heaviside(Rc - Rc_lambda, 0.5) for Rc_lambda in Rc_lambdas]) / len(Rc_lambdas)
return f
Not tested...
P.S.: Fig 4 from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7667952/ should make clearer, what 𝒞(𝑅𝐶) looks like :
@orbeckst Thanks. I know this might sound like a reviewer question. Why do you "Divide all the counts by the total number of lambda windows." Instead of just taking the average of the 𝑅𝐶 or (1-𝑅𝐶) across all lambda windows?
I wrote down one way in which one can calculate the cumulative probability function that you see in the figure. I am not sure that you can do this with a simple average.
Why do you "Divide all the counts by the total number of lambda windows."
So that my $\mathcal{C}(R_c)$ is a number between 0 and 1.
(If I am making a mistake somewhere here I hope @VOD555 will correct me.)
We initially wanted a quantity that told us something about the distribution of Rc values. The distribution itself says something about individual values (but we didn't really use it). The cumulative distribution seemed important to us as a measure that assesses all windows together because all lambda values are used for a single free energy estimate. The Ac was then just a way to condense the C(Rc) into a single number.
I don't think that the $A_c$ is equal to the average over all the $R_c$ (or over $1 - R_c$): The average is
$$ \langle R_c \rangle = \int_0^1 dR_c p(r_c) R_c $$
where $p(R_c)$ is probability to observe $R_c$ over all $\lambda$.
On the other hand,
$$ \begin{align} \mathcal{C}(R_c) &= \int_0^{R_c} dR_c' p(R_c')\ A_c &= \int_0^1 dR_c \mathcal{C}(R_c) \end{align} $$
so it's clear that $A_c \neq \langle R_c \rangle$. They measure different things.
Sorry if I misunderstood your question .... please feel free to ask again.
Thanks. Some small questions. Did you run it on only dHdl or the relevant column on the u_nk as well? Did you run the R_c on the original dataset or the dataset after equilibrium detection?
In our SAMPL7 work we only run it on dHdl, but in principle, one can run it on other column on the u_nk. We run the R_c after equilibrium detection.
Add the analysis of converging fraction R_c from the paper
S. Fan, B. I. Iorga, and O. Beckstein. Prediction of octanol-water partition coefficients for the SAMPL6- log P molecules using molecular dynamics simulations with OPLS-AA, AMBER and CHARMM force fields. Journal of Computer-Aided Molecular Design, 34:543–560, 2020. https://doi.org/10.1007/s10822-019-00267-z
cc @VOD555