PGelss / scikit_tt

Tensor Train Toolbox
GNU Lesser General Public License v3.0
104 stars 25 forks source link

Documentation For Data Format #25

Open zacharycbrown opened 2 years ago

zacharycbrown commented 2 years ago

Dear Dr. Patrick Gelß,

Thank you for making this repository available!

How might one apply the tgEDMD algorithm to datasets with multiple simulations?

More specifically, I have a dataset comprised of S simulations of multi-dimensional time series (each with shape [d,m], if I understand this repository's naming convention correctly); that is, my dataset is of shape (S, d, m). Based on what I've been able to find in this repository, the amuset_hosvd method requires the input data to be of shape (d,m); does this mean I need to either flatten my dataset to wrap the S dimension into one of the others or instead run the amuset_hosvd method once for each simulation?

I am hoping to use the tgEDMD algorithm to simultaneously model the information gleaned across all simulations if at all possible, so any guidance towards that end would be greatly appreciated!

Thank you!

zacharycbrown commented 2 years ago
PGelss commented 2 years ago

Dear Zachary,

Thanks for your interest in the Scikit-TT toolbox.

You are absolutely correct. If you want to apply tgEDMD to all snapshots at once, you first have to reshape your data tensor. Suppose X is your dataset of shape (S,d,m), where S is the number of simulations, d the dimension of the state space, and m the number of snapshots per simulation, then you can use X.transpose([1, 0, 2]).reshape([d, S*m]) as input for amuset_hosvd.

The dataset for ala10 consists of 6 independent simulations which can be simply concatenated as we do not consider any correlations between different time steps when applying tgEDMD.

I hope I could help you. Let me know, if you have further questions.

Best, Patrick