Closed jchodera closed 8 years ago
So do you still think there's something wrong with the clustering in PyEMMA? You mentioned you were getting weird results on this dataset too? I would love to do something proved to work ok on the SETD8 dataset.
I suspect there is. I think we should test on something simple like the alanine dipeptide dataset from MSMBuilder.
ok!
@rafwiewiora: Are you or @maxentile able to take a stab at testing minRMSD clustering on an alanine dipeptide dataset, or should I do that?
Hi, what exactly are we interested in testing here? Regular-time clustering will probably be fragile, regardless of metric...
To do the same analysis except with the alanine dipeptide dataset in msmbuilder
, we can insert:
from msmbuilder.example_datasets import AlanineDipeptide
trajs = AlanineDipeptide().get().trajectories
trajectory_filenames = []
for i,traj in enumerate(trajs):
fname = 'alanine_{0}.h5'.format(i)
trajectory_filenames.append(fname)
traj.save_hdf5(fname)
before line 36 in cluster.py
.
Are we interested in checking for correctness of the pyemma
implementation? Or some measure of the quality of the resulting discretization?
When looking at coarse-graining algorithms a few months ago, I had collected some results on this dataset with a different clustering algorithm (k-medoids), but the same metric (minRMSD) -- in case this of interest here: https://github.com/maxentile/automatic-state-decomposition/blob/master/decompose-py/Alanine%20benchmark%20%2B%20performance%20comparison.ipynb
Thanks, @maxentile! To be clearer here: The resulting timescales from my minRMSD clustering of 1M CK2 datapoints was so poor that it was highly reminiscent of earlier bugs in my own minRMSD code when the bugs caused the code to produce incorrect Voronoi partitions of configuration space. I wanted to be sure of the following things:
That code snippet will very much help!
That notebook with K-medoids minRMSD clustering gives a beautiful implied timescales plot, by the way.
I'm going to check this in now because I finally have something that works and the scripts may be useful to others.
This PR checks in a script to do a very basic minRMSD-based clustering of CK2 with equitemporal generator selection.
Clustered snapshot identities are on
hal
here: