VegeWaterDynamics / motrainer

Distributed Measurement Operator Trainer for Data Assimilation Applications
https://vegewaterdynamics.github.io/motrainer/
Apache License 2.0
2 stars 0 forks source link

Example notebook for DNN training #88

Closed rogerkuou closed 11 months ago

rogerkuou commented 1 year ago

After the model training split and the train-test split, we will have dask data bags with data-frames. These DataFrames can also be used in DNN trainings, as in the example python script.

Currently when training for large number of gridcells, we parallel the workflow by submitting each grid cell as a separate SLURM job, as this example. We can improve this by using Dask SLURMCluster instead of directly submit via SLURM. In this way one can reserve a continuous time on the HPC computation resources and get less chance to be interrupted by a SLURM queue.

Goal: have 1) an example Jupyter NB and 2) and example python script to:

  1. load example data as Xarray following this example
  2. Split data to 1) per grid cell per training and 2) train test dataset following this example (https://github.com/VegeWaterDynamics/motrainer/tree/generalization/exploration)
  3. Train the splitted data following the example python script, parallel per grid using Dask SLURMCLUSTER
rogerkuou commented 12 months ago

Hi @SarahAlidoost, I assigned you here just to keep track of who is working on what. We can discuss details when you would like to start.

rogerkuou commented 11 months ago

solved by #97