Closed TeaganKing closed 4 months ago
To elaborate on considerations for computational resource allocation:
And some more thoughts on environments:
Currently, notebooks are by default run in the environment specified bydefault_kernel_name
under computation_config
in config.yml
. Each individual notebook can also specify its own environment, under the key kernel_name
in its entry under compute_notebooks
. These environments must already be installed on the user's machine for this to work (this is checked before the notebooks are run).
An idea that was floated at one point was having the notebooks run in the active environment by default (see https://github.com/rmshkv/nbscuid/issues/24).
Another consideration is which environment nbscuid
(or whatever we end up calling the main run engine) is installed in vs. which environment the notebooks need to run in. I've been keeping these separate, but in the future it would probably be best to have one common environment that contains all the necessary analysis packages as well as nbscuid
to minimize setup steps and confusion.
For parallelization, I think it would be useful to require users to request all compute resources ahead of time rather than having each notebook add additional jobs to the queue. To achieve that, we probably want to use LocalCluster
objects inside every notebook (and specify in config.yml
how big the local cluster should be). So the workflow on the NCAR machine would be "request N
cores on casper to run cupid-run
, and then have the notebooks use some of those cores as dask workers."
It might be the case that the maximum size for the local cluster is N-2
when submitting a job on N
cores; in dask-mpi
, one core is reserved to actually run the python code, a second one is reserved for the dask task manager, and then the rest of the cores can be workers. I suspect we will have to look at timing numbers and play with the configuration some if we go this route.
We've been providing yaml files in environments/
for a while now, and #61 introduced LocalCluster
into a couple of notebooks (we also settled on the serial ploomber executor for now)
In the bare-bones deployment, we need to be cognizant of engineering concerns, such as the following: