NCAR / CUPiD

CUPiD is a “one stop shop” that enables and integrates timeseries file generation, data standardization, diagnostics, and metrics from all CESM components.
https://ncar.github.io/CUPiD/
Apache License 2.0
24 stars 22 forks source link

Add parallelization to notebooks #61

Closed mnlevy1981 closed 6 months ago

mnlevy1981 commented 7 months ago

As a first pass, I added a dask distributed LocalCluster to the ocean notebook. Running CUPiD on a casper compute node using 8 cores and 80 GB of memory, the ocean notebook took four or five minutes to run (instead of 10 minutes). Not the best speed-up, but the actual compute cells ran ~5x faster.

The atmosphere notebook runs in ~30 seconds, so I didn't bother adding parallelization there. The land notebook runs in ~1 minute, and adding a LocalCluster didn't improve the run time at all. Here's a table summarizing the timing experiments (runtimes are MM:SS)

notebook serial runtime parallel runtime
adf_quick_run 00:27 - 01:28 -
ocean_surface 08:41 - 09:44 04:06 - 05:08
land_comparison 00:54 - 01:05 01:03

Note that I ran this four times... twice on a casper compute node with 8 cores and 80 GB of RAM (once with only the ocean using LocalCluster, once with both ocean and land using it), once on a casper compute node with 1 core and 10 GB RAM, and once on a casper compute node with 1 core and 80 GB RAM (so all notebooks are run in serial, but in the first configuration it's very possible that everything slowed down due to having less memory available)

mnlevy1981 commented 7 months ago

I am not entirely convinced

cluster = LocalCluster()
client = Client(cluster)

Is what we want... we might want to include an option to specify n_workers in the LocalCluster() call. At least with dask-mpi, the recommendation was to request N+2 workers from the queuing system if you wanted your cluster to be size N -- this provided a core for running the code, a core for the dask task manager, and then N cores for the workers. I'll test that out by running on 8 cores but setting n_workers=6, and if that shows improvement I'll solicit advise on how to include that in the config.yaml file.

mnlevy1981 commented 7 months ago

I've added LocalCluster to seaice.ipynb (and also added some arguments to open_mfdataset() to further speed it up). I still want to split the README into a base README document and a "tips for running on the NCAR machines" document, and also update the new "tips for NCAR" page to mention requesting additional cores before running cupid-run.

dabail10 commented 7 months ago

Nice! I will give it a whirl.

mnlevy1981 commented 7 months ago

@dabail10 -- I would run

$ qinteractive -l select=1:ncpus=12:mem=120GB

and then do the

$ conda activate cupid-dev
$ cupid-run config.yml

step on the allocation. (If you do this via JupyterHub, use Casper PBS Batch and request 12 CPUs and 120 GB of memory.)

mnlevy1981 commented 6 months ago

2149459 just cleans up the documentation a bit. I moved the NCAR-specific tips regarding FastX out of README.md into a new NCAR_tips.md file, and then added a section about running in parallel. That markdown file is also included in the web documentation:

cupid ncar tips