Closed mnlevy1981 closed 6 months ago
I am not entirely convinced
cluster = LocalCluster()
client = Client(cluster)
Is what we want... we might want to include an option to specify n_workers
in the LocalCluster()
call. At least with dask-mpi
, the recommendation was to request N+2
workers from the queuing system if you wanted your cluster to be size N
-- this provided a core for running the code, a core for the dask task manager, and then N
cores for the workers. I'll test that out by running on 8 cores but setting n_workers=6
, and if that shows improvement I'll solicit advise on how to include that in the config.yaml
file.
I've added LocalCluster
to seaice.ipynb
(and also added some arguments to open_mfdataset()
to further speed it up). I still want to split the README into a base README document and a "tips for running on the NCAR machines" document, and also update the new "tips for NCAR" page to mention requesting additional cores before running cupid-run
.
Nice! I will give it a whirl.
@dabail10 -- I would run
$ qinteractive -l select=1:ncpus=12:mem=120GB
and then do the
$ conda activate cupid-dev
$ cupid-run config.yml
step on the allocation. (If you do this via JupyterHub, use Casper PBS Batch and request 12 CPUs and 120 GB of memory.)
2149459 just cleans up the documentation a bit. I moved the NCAR-specific tips regarding FastX out of README.md
into a new NCAR_tips.md
file, and then added a section about running in parallel. That markdown file is also included in the web documentation:
As a first pass, I added a dask distributed
LocalCluster
to the ocean notebook. Running CUPiD on a casper compute node using 8 cores and 80 GB of memory, the ocean notebook took four or five minutes to run (instead of 10 minutes). Not the best speed-up, but the actual compute cells ran ~5x faster.The atmosphere notebook runs in ~30 seconds, so I didn't bother adding parallelization there. The land notebook runs in ~1 minute, and adding a
LocalCluster
didn't improve the run time at all. Here's a table summarizing the timing experiments (runtimes are MM:SS)adf_quick_run
ocean_surface
land_comparison
Note that I ran this four times... twice on a casper compute node with 8 cores and 80 GB of RAM (once with only the ocean using
LocalCluster
, once with both ocean and land using it), once on a casper compute node with 1 core and 10 GB RAM, and once on a casper compute node with 1 core and 80 GB RAM (so all notebooks are run in serial, but in the first configuration it's very possible that everything slowed down due to having less memory available)