deeptime-ml / deeptime

Python library for analysis of time series data including dimensionality reduction, clustering, and Markov model estimation
https://deeptime-ml.github.io/
GNU Lesser General Public License v3.0
747 stars 82 forks source link

Cross Validation score for MaximumLikelihoodMSMs and BayesianMSMs #293

Closed prateekbansal97 closed 3 months ago

prateekbansal97 commented 3 months ago

Hello deeptime developers,

I would like to request a pyemma-style cross validation score for scoring MSMs (MaximumLikelihoodMSM, BayesianMSM), which was a useful tool in pyemma to plot the errors in VAMP score.

An implementation in pyemma looked like:

msm=pyemma.msm.estimate_markov_model(dtrajs=cluster_tica_dtrajs,lag=lagtime,score_method='VAMP2',score_k=5)
scores=msm.score_cv(dtrajs=cluster_tica_dtrajs)

If not as a feature, I would like guidance as to how to calculate the scores with the current implementation.

P.S. Your tools are highly useful in general, thanks for the nice implementation!.

Thanks!

clonker commented 3 months ago

Cheers, you are right, I have never added an example regarding that! My bad! For the time being, you can check this notebook: https://github.com/markovmodel/pyemma-workshop/blob/master/notebooks/02-io-features-hands-on.ipynb

The relevant bit is this:

from deeptime.decomposition import TICA, vamp_score_cv

fig, axes = plt.subplots(1, 3, figsize=(12, 3), sharey=True)
labels = ['backbone\ntorsions', 'heavy Atom\ndistances']
tica_estimator = TICA(lagtime=lags[0], dim=dim)

for ax, lag in zip(axes.flat, lags):
    tica_estimator.lagtime = lag
    torsions_scores = vamp_score_cv(tica_estimator, trajs=bbtorsions, blocksplit=False, n=3)
    scores = [torsions_scores.mean()]
    errors = [torsions_scores.std()]
    distances_scores = vamp_score_cv(tica_estimator, trajs=heavy_atom_distances, blocksplit=False, n=3)
    scores += [distances_scores.mean()]
    errors += [distances_scores.std()]
    ax.bar(labels, scores, yerr=errors, color=['C0', 'C1', 'C2'])
    ax.set_title(r'lag time $\tau$={}ps'.format(lag))

axes[0].set_ylabel('VAMP2 score')
fig.tight_layout()

You can provide an estimated MSM and/or bayesian MSM as well.

Reference: https://deeptime-ml.github.io/latest/api/generated/deeptime.decomposition.vamp_score_cv.html

prateekbansal97 commented 3 months ago

Hello!

Thanks for the reply. I was able to implement the suggestion.