Closed sonyahanson closed 7 years ago
Think I've done this, it's pretty straightforward.
[t,g,Neff_max] = pymbar.timeseries.detectEquilibration(mcmc.DeltaG.trace())
So far just using this to color the trace figure so far, but think the first two results are interesting:
Cool!
For the Bosutinib-AB dataset, that bistability worries me, though I've seen it before. If you omit the outlier point, does it still have that same kind of bistability?
I have yet to analyze the exact dataset with the outlier omitted, but for a similar dataset this bistability is much less of a problem.
The bistability is still concerning, since there are some crazy-tight affinities being sampled, and the confidence interval is being thrown way off as a result.
I wonder if it's possible to plot a bunch of traces on the same plot and see what specifically is causing the huge swings.
First 95% interval plots coming out! Any suggestions on this more than welcome!
What's the big "2.5%-ile, 50.0%-ile, 97.5%-ile" stuff on the plot?
Can we just extract the mean and 95% CI and print it as "X [Y, Z] kT"?
I'm super confused by the one green point on the bottom plot. What is that?
For the top plots, any chance we can draw thin non-dotted lines with transparency?
Can we just extract the mean and 95% CI and print it as "X [Y, Z] kT"?
Previously you suggested using the mean was misleading? Maybe using the 50th percentile value is better?
Just to be clear on what you mean by "X [Y, Z] kT" in this case this would be (here with 50th percentile value and not mean):
-13.2 [-14.2,-12.3] kT
?
I'm super confused by the one green point on the bottom plot. What is that?
The t=0
legend indicates the t at which equilibration is reached according to pymbar.timeseries.detectEquilibration()
. See discussion above from two days ago. Certainly t
is not quite appropriate here.
Previously you suggested using the mean was misleading? Maybe using the 50th percentile value is better?
For highly skewed data, the median (50th percentile value) is indeed probably better.
Just to be clear on what you mean by "X [Y, Z] kT" in this case this would be (here with 50th percentile value and not mean): -13.2 [-14.2,-12.3] kT?
This is great!
Updated figure:
Going to go ahead an merge this unless there are further comments.
Sweet! Thanks!
Could we get a plot like this for the constant-pH grant?
Wait, the MAP black line is in the wrong place, isn't it? It should be at the maximum of the DeltaG, right?
Is the MAP estimate computed at the beginning or end of sampling?
Looks like the MAP is computed at the end of sampling:
map = pymcmodels.map_fit(pymc_model)
DeltaG_map = map.DeltaG.value
pymcmodels.map_fit is here: https://github.com/choderalab/assaytools/blob/master/AssayTools/pymcmodels.py#L403 And uses this: https://pymc-devs.github.io/pymc/modelfitting.html#maximum-a-posteriori-estimates
I'm not sure why it's not more central...
OK, thanks!
I'm not sure why it's not more central...
It's fitting the multidimensional parameter space, not the marginal distribution P(DeltaG) where everything else is integrated out, so the P(\theta) that it finds has a slightly different value of DeltaG than the maximum of the marginal P(DeltaG). Definitely possible, but surprising!
Interesting!
Merging. This is done. Not sure the equilibration detection has helped any of the problems we're having with the deltaG estimates, but having the 95% credibility interval now is nice.
First commit: Added delG trace to quickmodel output.