ESMValGroup / ESMValCore

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.
https://www.esmvaltool.org
Apache License 2.0
42 stars 38 forks source link

Add support for configuring Dask distributed #2040

Open bouweandela opened 1 year ago

bouweandela commented 1 year ago

Since iris 3.6, it is possible to use Dask distributed with iris. This is a great new feature that will allow for better memory handling and distributed computing. See #1714 for an example implementation. However, it does require some extra configuration.

My proposal would be to allow users to specify the arguments to distributed.Client and to the associated cluster, e.g. distributed.LocalCluster or dask_jobqueue.SLURMCluster in this configuration. This could either be added under a new key in config-user.yml or in a new configuration file in the ~/.esmvaltool directory:

Add to the user configuration file

We could add these new options to config-user.yml under a new key dask, e.g.

Example config-user.yml settings for running locally using a LocalCluster:

dask:
  cluster:
    type: distributed.LocalCluster

Example settings for using an externally managed cluster (e.g. set it up from a Jupyter notebook)

dask:
  client:
    address: tcp://127.0.0.1:45695

Example settings for running on Levante:

dask:
  client: {}
  cluster:
    type: dask_jobqueue.SLURMCluster
    queue: interactive
    account: bk1088
    cores: 8
    memory: 16GiB
    local_directory: "/work/bd0854/b381141/dask-tmp"
    n_workers: 2

New configuration file

Or, we could add the new configuration in a separate file, e.g. called ~/.esmvaltool/dask.yml or ~/.esmvaltool/dask-distributed.yml.

Example config-user.yml settings for running locally using a LocalCluster:

cluster:
  type: distributed.LocalCluster

Example settings for using an externally managed cluster (e.g. set it up from a Jupyter notebook)

client:
  address: tcp://127.0.0.1:45695

Example settings for running on Levante:

client: {}
cluster:
  type: dask_jobqueue.SLURMCluster
  queue: interactive
  account: bk1088
  cores: 8
  memory: 16GiB
  local_directory: "/work/bd0854/b381141/dask-tmp"
  n_workers: 2

@ESMValGroup/esmvaltool-coreteam Does anyone have an opinion on what the best approach is here? A new file or add to config-user.yml?

bouweandela commented 1 year ago

Another question: would we like to be able to configure Dask distributed from the command line? Or at least pass in the scheduler address if we already have a Dask cluster running, e.g. started from a Jupyter notebook?

valeriupredoi commented 1 year ago

cheers @bouweandela - sorry I slacked at this - I'll come back with a deeper and more meanigful analysis (yeh, beware :rofl: ) but before I do that, here's two quick comments:

remi-kazeroni commented 1 year ago

Thanks a lot for your work @bouweandela! I would also suggest not to put dask related settings to the config-user.yml. I think the Dask configuration topic is too advanced for many of our users and that should remain transparent for them. Also, if we were to modify the config-user.yml, that will need to be reflected to the Tool docs, tutorial, our training activities, ... I would prefer to have that in separated files as you suggest, e.g. .esmvaltool/dask.yml or .esmvaltool/distributed.yml.

Another question: would we like to be able to configure Dask distributed from the command line?

It could be nice to have the possibility to use something like esmvaltool config get_config_dask from the command line. But if you think that's not helpful or too much extra work, let's not worry too much about that.

we should think about HPC-wide configurations, in the case of central installations of the Tool

Yes, that's a good point. I'm just worried that this can make it more complicated for us the developers: time to get answers from HPC admins, updates in the software stack, number of machines supported, ... Perhaps we could simply link to specific documentation on Dask usage if HPC centers provide that (here is an example for DKRZ).

valeriupredoi commented 1 year ago

This is still an ongoing discussion so needs reopening

bouweandela commented 1 year ago

Suggestion by @sloosvel:

Regarding the configuration, is it possible to have multiple configurations in dask.yml? Not every recipe will require the same type of resources.

bouweandela commented 11 months ago

At the workshop at SMHI agreement was reached that a new configuration file format would be acceptable. I will make a proposal, but this will not be implemented in time for v2.10.