choderalab / pymbar

Python implementation of the multistate Bennett acceptance ratio (MBAR)
http://pymbar.readthedocs.io
MIT License
237 stars 92 forks source link

subsampleCorrelatedData for 3d coordinate data #315

Closed ljmartin closed 4 years ago

ljmartin commented 4 years ago

Hi all, Thanks for providing PyMBAR.

Is it possible to use this approach with 3d coordinate data? My first hurdle is that subSampleCorrelatedData appears to only take ndarrays of shape (n,) or (n,1), whereas coordinates obviously will be shape (n,3).

example:

timeseries.subsampleCorrelatedData(np.random.normal(0,1,(10,)))
>>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

timeseries.subsampleCorrelatedData(np.random.normal(0,1,(10,2)))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-192-a6a6f13247e8> in <module>
----> 1 timeseries.subsampleCorrelatedData(np.random.normal(0,1,(10,2)))

~/miniconda3/envs/lew_conda/lib/python3.7/site-packages/pymbar/timeseries.py in subsampleCorrelatedData(A_t, g, fast, conservative, verbose)
    690         if verbose:
    691             print("Computing statistical inefficiency...")
--> 692         g = statisticalInefficiency(A_t, A_t, fast=fast)
    693         if verbose:
    694             print("g = %f" % g)

~/miniconda3/envs/lew_conda/lib/python3.7/site-packages/pymbar/timeseries.py in statisticalInefficiency(A_n, B_n, fast, mintime, fft)
    172 
    173         # compute normalized fluctuation correlation function at time t
--> 174         C = np.sum(dA_n[0:(N - t)] * dB_n[t:N] + dB_n[0:(N - t)] * dA_n[t:N]) / (2.0 * float(N - t) * sigma2_AB)
    175         # Terminate if the correlation function has crossed zero and we've computed the correlation
    176         # function at least out to 'mintime'.

ValueError: operands could not be broadcast together with shapes (10,2) (9,2) 

Ultimately I've got simulated (serial) tempering results of ligands freely diffusing across a 3d volume that includes a binding site, and would like to determine the lowest free energy region. If you can comment on whether this will be feasible then that would also be appreciated.

Thanks! Lewis

mrshirts commented 4 years ago

Right now this isn't really possible. I would suggest looking at time series of a single scalar quantity (fluctuation in distance around a mean or otherwise significant location), or possibly looking at 3 timeseries, and looking at the maximum correlation time.

ljmartin commented 4 years ago

Thanks!