Closed mcwitt closed 3 years ago
The answer is a bit more complicated than originally described. We apparently need to do a two-pass solution:
In the first pass, we don't use any experimental free energies, and set one compound from each connected graph to have an absolute free energy of zero. Let's call the estimates of the dimensionless absolute binding free energy of each microstate species i
of compound c
from this first pass as g_1[c,i]
, where a more positive number is less favorable in binding.
In the next pass, set the experimental binding free energies for each microstate species i
of compound c
that has experimental data g_exp[c]
using the following relation:
g_exp_microstate[c,i] = g_exp[c] - ln( exp(-(s[c,i] + g_1[c,i])) / sum(exp(-(s[c,i] + g_1[c,:]))) )
Here, g_exp_microstate[c,i]
is what goes into DiffNet as the dimensionless absolute free energy of binding of that microstate, s[c,i]
is the dimensionless state energy penalty for compound c
microstate i
(which we assume are all 0 by default), and g_1[c,i]
is the result of the first pass above.
We then solve the second pass of DiffNet to get g_2[c,i]
.
Finally, we combine these to get the overall binding free energy for each compound as
g[c] = - ln( sum(exp(-(s[c,:] + g_2[c,i:]))) )
where the :
is the sum over all microstates i
.
In the first pass, we don't use any experimental free energies, and set one compound from each connected graph to have an absolute free energy of zero.
@jchodera To clarify, do you mean "one microstate of one compound from each connected graph" (i.e. a single node), or should we set all microstates associated with the chosen compound to zero?
In the first pass, we pick one (arbitrary) microstate and say that we have measured its absolute DeltaG = 0 +- dDeltaG (where the uncertainty is somewhat arbitrary---pick unity, for example) just to break the degeneracy.
I think you can avoid specifying any absolute DeltaG at all in this pass if you just use the pseudoinverse, but I can't recall if Hannah's script already did that or not.
I think you can avoid specifying any absolute DeltaG at all in this pass if you just use the pseudoinverse
Hmm, it looks like Arsenic does use pseudoinverse: https://github.com/openforcefield/Arsenic/blob/master/arsenic/stats.py#L187
So maybe zeroing an arbitrary node is is not necessary? I'll experiment with this.
I think you can do exactly what arsenic does for the first pass! You just need relative free energies.
We have experimental free energies by compound for some compounds, but need microstate-level free energies for MLE analysis. I believe @jchodera has an idea of how to achieve this.