choderalab / MSMs

Markov state models
GNU General Public License v2.0
3 stars 7 forks source link

Experimental CK2 minRMSD-based clustering #9

Closed jchodera closed 8 years ago

jchodera commented 8 years ago

This PR checks in a script to do a very basic minRMSD-based clustering of CK2 with equitemporal generator selection.

Clustered snapshot identities are on hal here:

/cbio/jclab/home/chodera/github/choderalab/msms/MSMs.jchodera/jchodera/CK2/pyemma
rafwiewiora commented 8 years ago

So do you still think there's something wrong with the clustering in PyEMMA? You mentioned you were getting weird results on this dataset too? I would love to do something proved to work ok on the SETD8 dataset.

jchodera commented 8 years ago

I suspect there is. I think we should test on something simple like the alanine dipeptide dataset from MSMBuilder.

rafwiewiora commented 8 years ago

ok!

jchodera commented 8 years ago

@rafwiewiora: Are you or @maxentile able to take a stab at testing minRMSD clustering on an alanine dipeptide dataset, or should I do that?

maxentile commented 8 years ago

Hi, what exactly are we interested in testing here? Regular-time clustering will probably be fragile, regardless of metric...

maxentile commented 8 years ago

To do the same analysis except with the alanine dipeptide dataset in msmbuilder, we can insert:

from msmbuilder.example_datasets import AlanineDipeptide
trajs = AlanineDipeptide().get().trajectories
trajectory_filenames = []
for i,traj in enumerate(trajs):
    fname = 'alanine_{0}.h5'.format(i)
    trajectory_filenames.append(fname)
    traj.save_hdf5(fname)

before line 36 in cluster.py.

Are we interested in checking for correctness of the pyemma implementation? Or some measure of the quality of the resulting discretization?

maxentile commented 8 years ago

When looking at coarse-graining algorithms a few months ago, I had collected some results on this dataset with a different clustering algorithm (k-medoids), but the same metric (minRMSD) -- in case this of interest here: https://github.com/maxentile/automatic-state-decomposition/blob/master/decompose-py/Alanine%20benchmark%20%2B%20performance%20comparison.ipynb

jchodera commented 8 years ago

Thanks, @maxentile! To be clearer here: The resulting timescales from my minRMSD clustering of 1M CK2 datapoints was so poor that it was highly reminiscent of earlier bugs in my own minRMSD code when the bugs caused the code to produce incorrect Voronoi partitions of configuration space. I wanted to be sure of the following things:

That code snippet will very much help!

jchodera commented 8 years ago

That notebook with K-medoids minRMSD clustering gives a beautiful implied timescales plot, by the way.

jchodera commented 8 years ago

I'm going to check this in now because I finally have something that works and the scripts may be useful to others.