DMD for batched trajectories/snapshots

FlowModelingControl / flowtorch

flowTorch - a Python library for analysis and reduced-order modeling of fluid flows

GNU General Public License v3.0

131 stars 45 forks source link

DMD for batched trajectories/snapshots #46

Closed thibmonsel closed 6 months ago

thibmonsel commented 6 months ago

Thanks for the great library ! In the documentation An introduction to DMD, I was wondering if there was any batched equivalent of the DMD example given ?

In the general deep learning we would have more often than not, a batch of trajectories.

dmd = DMD(data_matrix) # here data_matrix is of shape [Nt, #features]

The only way I see this happening and the slow way to would to do a for loop ?

AndreWeiner commented 6 months ago

Hi @thibmonsel, I may not understand your question correctly. The DMD class fits a DMD model to a given data matrix, where each column of the data matrix is formed by a snapshot (e.g., the pressure field at a given time instance reshaped into a long vector), and the snapshots should be sampled at constant time intervals. Would you like to fit the DMD incrementally, e.g., because the data matrix would be very large, or would you like to fit multiple independent DMD models to different data matrices? Best regards, Andre

thibmonsel commented 6 months ago

Thanks for getting back to me @AndreWeiner, I'm sorry if my message wasn't clear enough. My goal here would be to use DMD to see if predictive potential on a dataset of several trajectories (or snapshots).

My current setup is that I possess a batch of trajectories, where each trajectory simulates a certain physical process (lets say a pressure field for example). Therefore my dataset is of shape [N, N_t, #features] where N is the number of trajectories I got, N_t the number of time steps of my physical process and #features the number of features (e.g a pressure field).

How would you use DMD on this sort of dataset ?

Would you suggest to do independent DMDs on each trajectory of my dataset ? If that's the case, would each DMD have most of its eigenvalues, and eigenvectors close to each other ?
If I would do a DMD on my whole dataset I would probably have to reshape to [N_t, N * #features]and fit the DMD but that would computationally expensive ?

AndreWeiner commented 6 months ago

Let's go through a hypothetical dataset just to be sure. Let's say you get your data from a simulation where you know one or more fields (pressure, velocity, temperature, ... - features?) at $M$ different locations (e.g., cell centers). Of these fields, you save $N$ snapshots at evenly spaced time intervals. There are multiple options how to employ DMD in this scenario:

you create one independent DMD model for each field; these models likely show similar spectra because they are connected by physical equations; however, there is no guarantee that the models have similar properties (e.g., prediction accuracy, spectra, ...)
you create one big DMD model for all fields; in this case, the fields should be normalized such that the value range and the units match (you don't want to add $10^5$ Pa to $5$ m/s); then, all fields at a given time are stacked into a long column vector

For case 1., the data matrix for each field should be of size $M\times N$. For case 2., the data matrix should be of size $FM\times N$, where $F$ is the number of scalar fields (e.g., $F=4$ for pressure (1) and velocity (3)). The eigenvalues will be the same for all fields. Variant 1 will be slightly less expensive in terms of memory. The overall cost really depends on the size of the data matrix (typically $M$ poses the largest challenge).

To make predictions, use the DMD.predict(...) function.

Best, Andre

thibmonsel commented 6 months ago

Thanks for the detailed explanantion ! To clarify even further the hypothetical dataset : 1/ get your data from a simulation where you know one or more fields (pressure, velocity, temperature, ... - features?) at M different locations, 2/ Of these fields, you save snapshots at evenly spaced time intervals. 3/ AND I repeat this process $K$ times so what I have is $K$ data matrices of shape $[N, M, F]$ (each dimension correspond to the ith sample of aggregated snapshots, the N snapshots, M cells centers and F features).

This setup is slightly different than what you mentionned and was wondering if I should still tackle independently each ith sample of aggregated snapshots as you suggested or do something else. Kind regards.

AndreWeiner commented 6 months ago

I see. There are two options then (using the pyDMD package):

ensemble DMD: you build the data matrices $\mathbf{X}$ and $\mathbf{Y}$ (time-shifted) independently by stacking the $K$ independent trajectories along the column direction as in the example
if you vary some parameters of $K$, parametric DMD would be a better option

Best, Andre

thibmonsel commented 6 months ago

Thanks for pointing out all of this, I'll close the issue now. Best