Closed xjules closed 1 year ago
dask-jobqueue
Looking at #6328 - Investigate whether we can use Dask to handle all drivers directly.
dask
can create a 'local' cluster with dask.distributed.LocalCluster
(but people often do this implicitly by just going straight to a Client
). And to create a local cluster with all workers running in dedicated subprocesses, there's also dask.distributed.SubprocessCluster
.
For 'real' clusters, the following queues are supported out of the box:
LocalCluster
dask_jobqueue.HTCondorCluster
dask_jobqueue.LSFCluster
👈 should work with our LSF clusterdask_jobqueue.MoabCluster
dask_jobqueue.OARCluster
dask_jobqueue.PBSCluster
👈 works with our Torque clusterdask_jobqueue.SGECluster
dask_jobqueue.SLURMCluster
👈 should work with our SLURM clusterThere are sub-projects for other scenarios:
dask-mpi
supports deploying a Dask cluster using MPI4Py
.dask-yarn
supports Apache YARN clusters.dask-kubernetes
provides support for Kubernetes with the KubeCluster
and HelmCluster
classes.dask-cloudprovider
provides classes for constructing and managing ephemeral Dask clusters on various cloud platforms, such as AWS Fargate, EC2, and ECS; GCP; AzureVM; Droplet (Digital Ocean); Hetzner.A defunct, archived project suppports DRMAA: dask-drmaa
.
Finally, it is also possible to use the awaitable dask.distributed
base classes Scheduler
, Worker
or Nanny
, and Client
to create objects explicitly.
In order:
LSFCluster(processes=12)
~/.config/dask/jobqueue.yaml
(which is created when you first import dask_jobqueue
)/etc/dask/jobqueue.jaml
, which will config all jobs on the cluster.For example:
jobqueue:
pbs:
cores: 12
memory: 8GiB
processes: 6
queue: hb120
local-directory: $LOCAL_STORAGE
On prem:
ssh st-linrgs034.st.statoil.no
source /prog/res/komodo/stable/enable
komodoenv --no-update ~/kenv
source /private/mtha/kenv/enable
python -m pip install dask-jobqueue
/prog/LSF/script/setup-ssh-keyauth
On Azure:
ssh s034-lcatop01.s034.oc.equinor.com
source /prog/komodo/stable/enable
komodoenv --no-update --root /prog/komodo ~/kenv
source /private/mtha/kenv/enable
python -m pip install dask-jobqueue
Then I'm running things in IPython. One could also run a headless Jupyter Lab session on the cluster and ssh
in from a desktop, but I think we might need a port opening for this (e.g. 8888).
Running on st-linrgs034.st.statoil.no
.
from dask_jobqueue import LSFCluster as Cluster
cluster = Cluster(cores=6, memory='2GiB', use_stdin=False)
cluster.scale(10) # n=10 workers
This seems to start workers on the cluster.
Failing to pass use_stdin=False
results in an error: Task exception was never retrieved
.
We can look at a job script:
print(cluster.job_script())
Results in:
#!/usr/bin/env bash
#BSUB -J dask-worker
#BSUB -n 4
#BSUB -R "span[hosts=1]"
#BSUB -M 2000000
#BSUB -W 00:30
/private/mtha/kenv-oct/root/bin/python -m distributed.cli.dask_worker tcp://143.97.182.7:45463 --nthreads 1 --nworkers 4 --memory-limit 476.84MiB --name dummy-name --nanny --death-timeout 60
However, submitting jobs does not currently work.
This computes instantly:
from dask_jobqueue import PBSCluster
from dask.distributed import Client
cluster = PBSCluster(cores=6, memory="2GiB", queue='hb120')
cluster.scale(10) # n=10 workers
client = Client(cluster)
future = client.submit(lambda: 1 + 1)
print(future.result())
Gives 2, as expected. Then
print(cluster.job_script())
Results in:
#!/usr/bin/env bash
#PBS -N dask-worker
#PBS -q hb120
#PBS -l select=1:ncpus=24:mem=1908MB
#PBS -l walltime=00:30:00
/private/mtha/kenv-oct/root/bin/python -m distributed.cli.dask_worker tcp://10.85.202.20:38549 --nthreads 4 --nworkers 6 --memory-limit 317.89MiB --name dummy-name --nanny --death-timeout 60
If there are no references to a future, or the future goes out of scope, then it is (or might be? I'm not sure) considered inactive and Dask will not run its tasks. If, for example, we aren't interested in a process's outputs, only its side-effects (e.g. when writing a file) then we can force Dask to ignore the inactive future and run the task anyway:
For a script foo.sh
like:
#!/usr/bin/bash
sleep 1
echo OK > foo.out
We can run dask.distributed.fire_and_forget()
like (also on a collection of futures):
fire_and_forget(client.submit(os.system, "./foo.sh"))
Looks very promising! We should find out whether (in the long run at least), we can remove websocket communication and replace it purely by the dask API. The topics / docs which might be relevant to this:
When playing around dask, we can use the following the get status from the dask workers:
client = Client()
futures = [client.submit(dummy_job, i) for i in range(10)]
while not all(f.done() for f in futures):
time.sleep(1) # Check every second
workers_info = client.scheduler_info()['workers']
for worker, info in workers_info.items():
mem_usage = info["metrics"]["memory"]
print(f"Worker {worker} - Current memory usage: {mem_usage:}")
for future, i in zip(futures, range(10)):
print(future.result())
Another feature worth exploring is WorkerPlugin: https://distributed.dask.org/en/latest/plugins.html#distributed.diagnostics.plugin.WorkerPlugin transition can then give us access to workerstate: https://distributed.dask.org/en/latest/worker-state.html
The reason why the LSF
plugin to dask
does not work on our compute cluster is because it relies on using stdin
, which does not work for our LSF setup. This is a hack to fix it:
https://github.com/equinor/ert/blob/853b0c0dc4f383b43db12105ec921996920b9b6e/src/ert/ensemble_evaluator/builder/_prefect.py#L72-L79
and the overwriting th function in the LSFJob class in dask.
If you look at the repository at the time of the commit above there is an implementation of Dask that was running both in Azure and on LSF :slightly_smiling_face:
I found that this also seems to work and allows the cluster to start:
cluster = Cluster(cores=6, memory='2GiB', use_stdin=False)
However, I'm still having trouble getting a job to actually run...
Just be aware that if you set use_stdin=False
, the file defined by script_filename
still need to be placed on a network disk and available to all machines with our LSF setup :slightly_smiling_face:
XXXCluster
class documentationBit of a side-note about a weird feature of the docs.
This class has the following structure:
PBSCluster
< JobQueueCluster
< distributed.SpecCluster
< distributed.Cluster
< distributed.SyncMethodMixin
The PBSCluster
class has no __init__()
method so implicitly calls the parent constructor, and that is the API seen in the call signature (blue box).
All of the remaining arguments in the documentation are passed as **job_kwargs
and used by the "job", called job_cls
inside the PBSCluster
class. Jobs are instances of PBSJob
, a subclass of Job
, which is a subclass of distributed.ProcessInterface
.
dask_jobqueue.LocalCluster
is not the same as distributed.LocalCluster
(the default Dask cluster), which has a substantially different API. The docs say:
This is mostly for testing. It uses all the same machinery of dask-jobqueue, but rather than submitting jobs to some external job queueing system, it launches them locally. For normal local use, please see
dask.distributed.LocalCluster
Several arguments are deprecated:
extra
is now worker_extra_args
env_extra
is now job_script_prologue
header_skip
is now job_directives_skip
job_extra
is now job_extra_directives
project
is deprecated for PBSCluster
and SLURMCluster
(use account
instead) and is only used for LSFCluster
(see also below). Some arguments only pertain to certain cluster classes. For example the LSF and SLURM clusters both take (different!) optional extra arguments to control the number of CPUs ansd amount of memory that workers can access. In this table, a blue dot means the class accepts that argument. All other arguments are accepted by all classes.
Note: The LocalCluster
here is the one in dask_jobqueue
(see above)
LocalCluster (see note) |
LSFCluster |
PBSCluster |
SLURMCluster |
Docstring | |
---|---|---|---|---|---|
project |
🔵 | 🔵 | LSF: Project associated with each worker job. Passed to #BSUB -P option. |
||
account |
🔵 | 🔵 | PBS: Accounting string associated with each worker job. Passed to #PBS -A option. |
||
ncpus |
🔵 | Number of cpus. Passed to #BSUB -n option. |
|||
mem |
🔵 | Request memory in bytes. Passed to #BSUB -M option. |
|||
lsf_units |
🔵 | Unit system for large units in resource usage set byLSF_UNIT_FOR_LIMITS in the lsf.conf file of a cluster. |
|||
use_stdin |
🔵 | LSF's bsub command allows us to launch a job by passingit as an argument ( bsub /tmp/jobscript.sh ) or feedingit to stdin ( bsub < /tmp/jobscript.sh ). |
|||
job_cpu |
🔵 | Number of cpu to book in SLURM, if None , defaults toworker threads * processes . |
|||
job_mem |
🔵 | Amount of memory to request in SLURM. If None , defaults toworker processes * memory |
|||
resource_spec |
🔵 | 🔵 | PBS: Request resources and specify job placement. Passed to #PBS -l option. |
Closing this as completed.
The squad has reviewed the various components of dask
, distributed
, and dask_jobqueue
and concluded that it can very likely help with the implementation of ERT (as we've concluded before). We value features like:
distributed
and one in dask_jobqueue
.We will move on to some small experiments as we press ahead with refactoring the existing drivers.
Check suitability of dask library to replace drivers interface. Check for more info here: https://jobqueue.dask.org/en/latest/generated/dask_jobqueue.LSFCluster.html Also talk @sondreso for more info.