Why are we getting multimodal predictions for delG?

sonyahanson commented 8 years ago

See below images from the second half of this notebook: https://github.com/choderalab/assaytools/blob/master/data/spectra/Spectra-Analysis.ipynb

Hoping this will clear up with increased sampling...

srcerl srcgef

jchodera commented 8 years ago

Can you zoom in on the DeltaG trace vs time in In[19] of that notebook? It looks to me like there's just long correlation time fluctuations in the trace that lead these unconverged plots.

jchodera commented 8 years ago

You could even compute the correlation time with pymbar.timeseries.statisticalInefficiency() if you want to quantify this.

sonyahanson commented 8 years ago

Is that what you're referring to?

delg_mcmc

gregoryross commented 8 years ago

That certainly looks like the samples for delG are very correlated. If you calculate the statistical inefficiency as John suggests, perhaps you could thin the data by an amount t, where

g = 1 + 2t,

with g is the statistical inefficiency and t is the autocorrelation time. That could help with removing the multi-modality.

sonyahanson commented 8 years ago

Thanks! Will look at that more next week. For now just tested what increasing the iterations would do, and it does seem to help.

It seems like the number of iterations had been reduced for debugging in run_mcmc in pymcmodels.py(https://github.com/choderalab/assaytools/blob/master/AssayTools/pymcmodels.py#L455), so despite the fact that it seemed like the number of samples was high from top of the file:

#=============================================================================================
# Parameters for MCMC sampling
#=============================================================================================

DG_min = np.log(1e-15) # kT, most favorable (negative) binding free energy possible; 1 fM
DG_max = +0 # kT, least favorable binding free energy possible
niter = 500000 # number of iterations
nburn = 50000 # number of burn-in iterations to discard
nthin = 500 # thinning interval

The actual numbers used for the figures above were considerably smaller (nthin = 20, nburn = 20 000, niter = 40 000). Below I've commented out the # DEBUG section and rerun the same analysis so that now nthin = 20, nburn = 200 000, niter = 400 000. So that's ten times more sampling than above, though half of it is being dropped as 'burn in'. I ran it twice just to see the rough consistency of the results.

Run 1:

srcerl_more srcgef_more delg_more

Run 2:

srcerl2green_more srcgef2_more delg2_more

gregoryross commented 8 years ago

Presumably the delG trace is for SrcErl? It seems to have an odd behaviour with jumping down to much lower binding free energies.

sonyahanson commented 8 years ago

Yes, it's for SrcErl.

sonyahanson commented 7 years ago

We definitely see this jumping behavior in the delG trace most prominently for the data with prominent outliers, so maybe we can kill two birds with one stone (#52):

delg_bosutinib-ab-2016-11-28 1402

choderalab / assaytools

Why are we getting multimodal predictions for delG? #64