ClimeTrend / DMD

Dynamic mode decomposition for noisy climate data
GNU General Public License v3.0
0 stars 0 forks source link

DMD

This is a repo for developing dynamic mode decomposition for noisy climate data.

Dynamic mode decomposition is a plethora of data-driven physics-based machine learning techniques for uncovering coherent spatio-temporal structures in the data.

Here is an example of a fit to a synthetic data from a PDE. More details in /Notebooks/Demo.ipynb. On the left is ground truth, on the right is thd DMD prediction:

Demo

Install

To install the dependencies from the pyproject.toml file, run the following commands (assuming macOS) to create a virtual environment and install the package dependencies in editable mode:

For venv users:

git clone https://github.com/ClimeTrend/DMD.git
cd DMD
python -m venv venv
source venv/bin/activate
pip install -e .

And for conda users:

git clone https://github.com/ClimeTrend/DMD.git
cd DMD
conda create --name myenv
conda activate myenv
conda install pip
pip install -e .

DMD does not really require batching the training data in the deep-AI sense.

Directory /modules contains .py dependencies.

Directory /notebooks contains research notebooks.

Directory /data contains small sparse data, used for testing.

I (@pyatsysh) recommend to start by running /notebooks/Demo.ipynb. This notebook contains a minimal example of applying DMD to climate-like dataset. The actual data is generated from an advection-diffusion PDE.

The following capabilities are so far implemented (I've re-implemented some of those found in PyDMD):

Data model

DMD algorithms natively work with "snapshot matrices" of shape (N_x, N_t), where N_x is the number of pixels and N_t is the number of observations. In addition, the user typically provies the corresponding array of time, of shape (N_t, ).

Climate data is naturally periodic, e.g. Years, Months, Days, etc. Presently, it is assumed that training data comes in the form of daily images, stored as ndarrays of shape (ny, nx). And that daily observations are available for N_years.

Thus, training data is a list of size N_years, where each element is ndarray array of shape (N_days, ny, nx). A point in time is identified by its year and day within the year. Different "years" may have different numbers of days.

Short Term Plan: