deeptime-ml / deeptime

Python library for analysis of time series data including dimensionality reduction, clustering, and Markov model estimation
https://deeptime-ml.github.io/
GNU Lesser General Public License v3.0
745 stars 81 forks source link

Allowing weights for VAMP dimensionality reduction #259

Open jdrusso opened 1 year ago

jdrusso commented 1 year ago

Is your feature request related to a problem? Please describe. The TICA and VAMP decomposition classes both provide similar interfaces for .fit_from_timeseries(data). However, the TICA class allows a weights argument.

The VAMP decomposition, however, does not support weights, and throws an error if they're provided (see: https://github.com/deeptime-ml/deeptime/blob/11182accb1f8ce263f7c498b76c94bb657b5a998/deeptime/covariance/util/_running_moments.py#L245 )

Describe the solution you'd like Support for weights in VAMP.

I see some similarity between moments_XXXY() and moments_block(), but it seems like there was probably a reason for omitting support for weights from VAMP -- is that correct?

clonker commented 1 year ago

Hey JD,

that is an excellent question! I thought a little about this and I don't think there is something that speaks against having weights for VAMP per se. It is just that the weighting was ordinarily used to reweigh many short off-equilibrium trajectories to equilibrium statistics in conjunction with the KoopmanWeightingEstimator. May I ask what you want to achieve with the weights?

Cheers, Moritz

jdrusso commented 1 year ago

Hi Moritz,

Thanks for the response! Glad to hear there's no theoretical reason it's not doable. We're doing dimensionality reduction on sets of many off-equilibrium MD trajectories, using WESTPA weighted ensemble enhanced sampling. Weighted ensemble trajectories naturally carry weights with them, so we'd like to use those in the dimensionality reduction.

I've implemented weighted TICA with deeptime, but because we're often simulating unidirectional steady-state flows, I don't think the reversibility assumptions in TICA are appropriate, so we'd like to try VAMP.

clonker commented 1 year ago

Our covariance computation is a bit more complicated than the usual (X - mean(X)).T @ (X - mean(X)) / len(X) because of its online nature, so it might take a while until i get around to implementing this. It is a bit of a hack with double computation, but you can use the Covariance estimator twice - once on the non-lagged data to compute weighted XX and XY (make sure to set remove_data_mean=True) and once on lagged data (meaning you skip the first "lagtime" frames, respecting stride if you use that), also here remove the data mean. Then you have weighted XX, weighted YY, and unweighted cross covariance XY (this is never weighted). Finally combine the two CovarianceModel instances into one by using mean and covariance of the XX, XY model and the cov_00 of the second model. So in pseudocode:

est_instantaneous = Covariance(remove_data_mean=True, lagtime=100, compute_c00=True, compute_c0t=True, reversible=False, bessels_correction=False)
est_lagged = Covariance(remove_data_mean=True, compute_c00=True, reversible=False, bessels_correction=False)
for X, Y, weights_x, weights_y in your_data_with_lagtime_100:
    est_instantaneous.partial_fit((X, Y), weights=weights_x)
    est_lagged.partial_fit(Y, weights=weights_y)
model_inst = est_instantaneous.fetch_model()
model_lagged = est_lagged.fetch_model()
from deeptime.covariance import CovarianceModel
model_combined = CovarianceModel(
    cov_00=model_inst.cov_00,
    cov_0t=model_inst.cov_0t,
    cov_tt=model_lagged.cov_00,
    mean_0=model_inst.mean_0,
    mean_t=model_lagged.mean_t,
    bessels_correction=model_inst.bessels_correction,
    lagtime=model_inst.lagtime,
    symmetrized=False,
    data_mean_removed=True
)

from deeptime.decomposition import VAMP
VAMP().fit(model_combined)
jdrusso commented 1 year ago

Thanks, I think I can work with this! That pseudocode is really helpful to see, I appreciate you taking the time to share it.

clonker commented 1 year ago

Hi @jdrusso did you have a chance to implement this?

jdrusso commented 1 year ago

Thanks for checking in -- unfortunately I haven't, I had to swap focus to some other things. I know @jpthompson17 was also interested, not sure if he's done anything with it since