choderalab / assaytools

Modeling and Bayesian analysis of fluorescence and absorbance assays.
http://assaytools.readthedocs.org
GNU Lesser General Public License v2.1
18 stars 11 forks source link

Adding automated equilibration detection to delG calculation #69

Closed sonyahanson closed 7 years ago

sonyahanson commented 7 years ago

First commit: Added delG trace to quickmodel output.

sonyahanson commented 7 years ago

Think I've done this, it's pretty straightforward.

[t,g,Neff_max] = pymbar.timeseries.detectEquilibration(mcmc.DeltaG.trace())

So far just using this to color the trace figure so far, but think the first two results are interesting:

delg_bosutinib-ab-2016-11-21 1057

delg_bosutinib isomer-cd-2016-11-21 1104

jchodera commented 7 years ago

Cool!

For the Bosutinib-AB dataset, that bistability worries me, though I've seen it before. If you omit the outlier point, does it still have that same kind of bistability?

sonyahanson commented 7 years ago

I have yet to analyze the exact dataset with the outlier omitted, but for a similar dataset this bistability is much less of a problem.

delg_bosutinib-ij-2016-11-21 1153

jchodera commented 7 years ago

The bistability is still concerning, since there are some crazy-tight affinities being sampled, and the confidence interval is being thrown way off as a result.

I wonder if it's possible to plot a bunch of traces on the same plot and see what specifically is causing the huge swings.

sonyahanson commented 7 years ago

First 95% interval plots coming out! Any suggestions on this more than welcome! delg_bosutinib isomer-cd-2016-11-22 2021

jchodera commented 7 years ago

What's the big "2.5%-ile, 50.0%-ile, 97.5%-ile" stuff on the plot?

Can we just extract the mean and 95% CI and print it as "X [Y, Z] kT"?

jchodera commented 7 years ago

I'm super confused by the one green point on the bottom plot. What is that?

jchodera commented 7 years ago

For the top plots, any chance we can draw thin non-dotted lines with transparency?

sonyahanson commented 7 years ago

Can we just extract the mean and 95% CI and print it as "X [Y, Z] kT"?

Previously you suggested using the mean was misleading? Maybe using the 50th percentile value is better?

Just to be clear on what you mean by "X [Y, Z] kT" in this case this would be (here with 50th percentile value and not mean): -13.2 [-14.2,-12.3] kT?

sonyahanson commented 7 years ago

I'm super confused by the one green point on the bottom plot. What is that?

The t=0 legend indicates the t at which equilibration is reached according to pymbar.timeseries.detectEquilibration(). See discussion above from two days ago. Certainly t is not quite appropriate here.

jchodera commented 7 years ago

Previously you suggested using the mean was misleading? Maybe using the 50th percentile value is better?

For highly skewed data, the median (50th percentile value) is indeed probably better.

Just to be clear on what you mean by "X [Y, Z] kT" in this case this would be (here with 50th percentile value and not mean): -13.2 [-14.2,-12.3] kT?

This is great!

sonyahanson commented 7 years ago

Updated figure:

delg_bosutinib isomer-cd-2016-11-28 1409

sonyahanson commented 7 years ago

Going to go ahead an merge this unless there are further comments.

jchodera commented 7 years ago

Sweet! Thanks!

Could we get a plot like this for the constant-pH grant?

jchodera commented 7 years ago

Wait, the MAP black line is in the wrong place, isn't it? It should be at the maximum of the DeltaG, right?

Is the MAP estimate computed at the beginning or end of sampling?

sonyahanson commented 7 years ago

Looks like the MAP is computed at the end of sampling:

map = pymcmodels.map_fit(pymc_model)
DeltaG_map = map.DeltaG.value

pymcmodels.map_fit is here: https://github.com/choderalab/assaytools/blob/master/AssayTools/pymcmodels.py#L403 And uses this: https://pymc-devs.github.io/pymc/modelfitting.html#maximum-a-posteriori-estimates

I'm not sure why it's not more central...

jchodera commented 7 years ago

OK, thanks!

I'm not sure why it's not more central...

It's fitting the multidimensional parameter space, not the marginal distribution P(DeltaG) where everything else is integrated out, so the P(\theta) that it finds has a slightly different value of DeltaG than the maximum of the marginal P(DeltaG). Definitely possible, but surprising!

sonyahanson commented 7 years ago

Interesting!

sonyahanson commented 7 years ago

Merging. This is done. Not sure the equilibration detection has helped any of the problems we're having with the deltaG estimates, but having the 95% credibility interval now is nice.