Closed pl992 closed 3 years ago
I'm sorry I hadn't realize the parameter of deeptime.decomposition.VAMP
: observable_transform
allows to do exactly that trick (this time the documentation is really good!). I'll take the occasion asking clarification for this part of documentation of VAMPNet
This estimator is also a transformer and can be used to transform data into the optimized space. From there it can either be used to estimate Markov state models via making assignment probabilities crisp (in case of softmax output distributions) or to estimate the Koopman operator using the VAMP estimator.
So from this it seems that evaluating the assignment probabilities using softmax output distributions isn't compatible with estimating the Koopman operator using the VAMP estimator. From the reference
Andreas Mardt, Luca Pasquali, Hao Wu, and Frank Noé. Vampnets for deep learning of molecular kinetics. Nature communications, 9(1):1–11, 2018.
I understood that both can be considered correct at the same time: so you build the Koopman operator using the VAMP estimator and the assignments of each 'frame' to belong in one state are defined by the softmax output distributions. So essentially my question is: once I've built a low resolution MSM following precisely the documentation:
vamp_model_vnet = VAMP(lagtime=1, observable_transform=model).fit(traj).fetch_model()
projection = vamp_model_vnet.transform(traj)
dtraj_rec = KMeans(2).fit(projection).transform(projection)
msm = MaximumLikelihoodMSM().fit(dtraj_rec, lagtime=1).fetch_model()
How may I be able to know what is the probability assignment of each frame to a specific 'metastable state'?
Thanks for you help
The softmax output distributions can be understood as kind of a prior on your network (as in: you expect the data to possess a set of N
metastable states). That will give you a fuzzy assignment probability distribution for each frame. If you do not use a softmax, you generally won't obtain such a distribution. However it is still a perfectly fine Koopman model that you can work with as usual.
In case you go to Markov state models, you give up on the fuzzy assignment probabilities and make them crisp. This can be obtained by using clustering (as performed in the documentation) or, in case of a softmax output, by using the argmax over each transformed frame, yielding the most likely state.
So to answer your question(s):
MLP(units=[traj.shape[1], ..., 2], nonlinearity=nn.ReLU, output_nonlinearity=nn.Softmax)
for in this case two metastable states.You can find an example here: https://github.com/markovmodel/pyemma-workshop/blob/master/notebooks/08-vampnets-session.ipynb We will likely include it in the deeptime documentation shortly to make things more clear.
Thanks this is helpful, I do have a question regarding the notebook. In the 2D toy model paragraph the VAMPnet is used with 2 output nodes, meaning that the output states would be 2, which is exactly the dimension of transformed_data = model.transform(data)
. Following the notebook it builds a VAMP model as follows:
estimator = dt.decomposition.VAMP(lagtime=lag, observable_transform=model)
vamp_model = estimator.fit(data).fetch_model()
now I would expect that this would present a 2x2 koopman operator since the number of output states of the network was 2 but here it has only 1 dimension, is there something that is being ignored? I've tried also to put the var_cutoff=1
but nothing changed. Thank you again and sorry for these many questions
In the 2D toy model the VAMPNet has only one output node which captures all of the variance. The two output nodes was hypothetical if you were using a Softmax output.
So in that case your Koopman matrix is a 1x1
matrix. :slightly_smiling_face:
First thanks again for the very quick answer. I'm sorry but I'm more confused now: in the notebook the nodes are:
nodes
[2, 20, 20, 20, 20, 2]
and the fit is called as follow
lobe = MLP(units=nodes, nonlinearity=nn.ReLU, output_nonlinearity=nn.Softmax)
so it is using the Softmax output, and as I understood the output dimension would be two (the same as the last nodes number). In this way it transforms the data in a 2D function but when I transform them using VAMP they'll have only 1 dimension.
Removing the output_nonlinearity
parameter then I'm being able to obtain also the VAMP transformed data 2D but I cannot understand this behavior since I would expect the same dimension regardless this parameter
Ah yes, now I understand what you mean. In this two-state probability distribution everything is already captured in the first component, the second is just 1 - first_component
, otherwise you won't get any probabilities. So there is a strong linear dependence. That means it will be pruned away and you are again left with one dimension.
oh I see it now! so the VAMP model from a model using Softmax is always D-1 dimensions since the last dimension would be always 1-other_components. Although this may be obvious it gets a little confusing if it's not explicitly said. Thank you very much!
Yes that is right :slightly_smiling_face: as a final note, you may also get higher-dimensional VAMP models if you have more than two output states in the VAMPNet. By no means you always end up with a one-dimensional model (would be kind of cool though :grinning: )
I misread, thought you wrote 1D
instead of D-1
. Not enough coffee today. In any case, I am happy that I could help you!
I contacted you a few days ago regarding the VAMPnet usage. I solved everything but I think there's something missing. From the references it is reported that you could evaluate the ITS and the CKtest straight from the koopman matrix built on the transformed data from your VAMPnet. Starting from these data it isn't difficult to evaluate by your own the koopman operator, the ITS and the Chapmann-Kolmogorov test but for completeness I would ask these functions to be added so that the users could entrust your implementation, for sure much more robust and compact.
Again congratulations for this huge and extremely well organized library!