alchemistry / alchemlyb

the simple alchemistry library
https://alchemlyb.readthedocs.io
BSD 3-Clause "New" or "Revised" License
195 stars 49 forks source link

add R_c (convergence fraction) convergence analysis #104

Closed orbeckst closed 1 year ago

orbeckst commented 4 years ago

Add the analysis of converging fraction R_c from the paper

S. Fan, B. I. Iorga, and O. Beckstein. Prediction of octanol-water partition coefficients for the SAMPL6- log P molecules using molecular dynamics simulations with OPLS-AA, AMBER and CHARMM force fields. Journal of Computer-Aided Molecular Design, 34:543–560, 2020. https://doi.org/10.1007/s10822-019-00267-z

cc @VOD555

orbeckst commented 3 years ago

For other convergence analysis see http://getyank.org/latest/algorithms.html

xiki-tempula commented 2 years ago

@orbeckst Thanks for this issue. I'm interested in this way of checking convergence as well. I wonder if you have any code snippet or library that we could use?

VOD555 commented 2 years ago

convergence.zip @xiki-tempula Here's the script we used in SAMPL6 and SAMPL7 challenges.

xiki-tempula commented 2 years ago

@VOD555 Thank you for the script. I have a question with regard to the paper https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8397498/#FD17

I wonder how do you compute the equation 17 and 18?

Is the Ac actually being used? Thank you.

orbeckst commented 2 years ago

Quick comment (@VOD555 should be able to provide more details):

  1. In order to compute the cumulative prob. distribution C(Rc) (eq 17), you take the Rc for each lambda window (let's call them Rc_lambda, and then count for a range of values 0 ≤Rc ≤ 1 (e.g. evenly spaced), how many Rc_lambda ≤ Rc. Divide all the counts by the total number of lambda windows. (There might be a smart way to do this with cumulative sums and @VOD555 should be able to point you to actual code.) You can also create a function from heaviside step functions, something like
    def C_Rc_func_factory(Rc_lambdas):
        def f(Rc):
             return np.sum([np.heaviside(Rc - Rc_lambda, 0.5) for Rc_lambda in Rc_lambdas]) / len(Rc_lambdas)
        return f

    Not tested...

  2. We used Ac in Fig 2, and Tables 2-5.
orbeckst commented 2 years ago

P.S.: Fig 4 from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7667952/ should make clearer, what 𝒞(𝑅𝐶) looks like :

Cumulative distribution functions 𝒞(𝑅𝐶) of the convergence time fraction Rc for Coulomb and VDW λ windows

xiki-tempula commented 2 years ago

@orbeckst Thanks. I know this might sound like a reviewer question. Why do you "Divide all the counts by the total number of lambda windows." Instead of just taking the average of the 𝑅𝐶 or (1-𝑅𝐶) across all lambda windows?

orbeckst commented 2 years ago

I wrote down one way in which one can calculate the cumulative probability function that you see in the figure. I am not sure that you can do this with a simple average.

Why do you "Divide all the counts by the total number of lambda windows."

So that my $\mathcal{C}(R_c)$ is a number between 0 and 1.

(If I am making a mistake somewhere here I hope @VOD555 will correct me.)

We initially wanted a quantity that told us something about the distribution of Rc values. The distribution itself says something about individual values (but we didn't really use it). The cumulative distribution seemed important to us as a measure that assesses all windows together because all lambda values are used for a single free energy estimate. The Ac was then just a way to condense the C(Rc) into a single number.

I don't think that the $A_c$ is equal to the average over all the $R_c$ (or over $1 - R_c$): The average is

$$ \langle R_c \rangle = \int_0^1 dR_c p(r_c) R_c $$

where $p(R_c)$ is probability to observe $R_c$ over all $\lambda$.

On the other hand,

$$ \begin{align} \mathcal{C}(R_c) &= \int_0^{R_c} dR_c' p(R_c')\ A_c &= \int_0^1 dR_c \mathcal{C}(R_c) \end{align} $$

so it's clear that $A_c \neq \langle R_c \rangle$. They measure different things.

Sorry if I misunderstood your question .... please feel free to ask again.

xiki-tempula commented 2 years ago

Thanks. Some small questions. Did you run it on only dHdl or the relevant column on the u_nk as well? Did you run the R_c on the original dataset or the dataset after equilibrium detection?

VOD555 commented 2 years ago

In our SAMPL7 work we only run it on dHdl, but in principle, one can run it on other column on the u_nk. We run the R_c after equilibrium detection.