dask / dask-jobqueue

Deploy Dask on job schedulers like PBS, SLURM, and SGE
https://jobqueue.dask.org
BSD 3-Clause "New" or "Revised" License
235 stars 142 forks source link

Add ability to instantiate named clusters from jobqueue config #604

Closed alisterburt closed 1 year ago

alisterburt commented 1 year ago

Discussed in #543

This PR allows custom clusters of any type to be specified in the jobqueue yaml

Below is an example config file with a second PBS cluster which uses a different queue called 'gpu'

jobqueue:
  pbs:
    cores: 36
    memory: 100GB
    queue: regular
  custom-pbs-cluster:
    job-scheduling-system: pbs
    queue: gpu

This can be instantiated with JobQueueCluster.from_name()

from dask_jobqueue import JobQueueCluster

cluster = JobQueueCluster.from_name("custom-pbs-cluster")

This also works directly with the standard job scheduling systems

from dask_jobqueue import JobQueueCluster

cluster = JobQueueCluster.from_name("pbs")

The main reasons for this PR are:

  1. to enable instantiating clusters of arbitrary type (SLURM, PBS) from scripts without writing against a specific cluster class
  2. ability to have multiple cluster configurations for the same scheduling system which differ slightly
  3. pave the way for cluster discovery from dask-ctl (which would enable named cluster specification from there)

543 was closed because the features provided here can be made available through a dask-ctl specific config yaml. I think this PR is still necessary because implementing autodiscovery of dask-jobqueue clusters in dask-ctl requires some way of 'discovering' possible clusters. With this PR, we can simply iterate over the keys in the jobqueue config and instantiate the specified clusters using JobQueueCluster.from_name().

Additionally, the config in dask-jobqueue is less Python specific than the dask-ctl yaml spec - this may be less of a burden for configuration by HPC admins who are not experts in Python.

95 is similar functionality but implemented at the cluster class level so does not solve point 1 above.

cc @guillaumeeb because you were involved in earlier discussions

Thank you in advance for any time spent reviewing!

guillaumeeb commented 1 year ago

Thanks @alisterburt for this proposal.

I went through the discussion in https://github.com/dask-contrib/dask-ctl/issues/61, and I'm not sure of the outcome. @jacobtomlinson do you think the change here are necessary or should we wait for a PR in dask-ctl?

alisterburt commented 1 year ago

Thank you for taking the time to go through the PR/discussion @guillaumeeb!

I think the correct answer here depends on how you think dask-jobqueue users should ideally configure their clusters moving forwards. Is that through a jobqueue config or a dask-ctl config? There is a valid argument against having both, but...

I see the two PRs (this and the future one in dask-ctl) as orthogonal and complementary unless there is an effort to move to using dask-ctl as the only way to create/manage clusters in the dask ecosystem

alisterburt commented 1 year ago

fixed the silly test failure but workflow requires re-approval from a maintainer 🙂

alisterburt commented 1 year ago

@guillaumeeb @jacobtomlinson thank you both for the reviews!

@guillaumeeb as there currently is no consensus around whether this PR is wanted I will hold off on updating for now.

@jacobtomlinson I am completely on board with the centralisation of cluster lifecycle management in dask-ctl, one of the main reasons for this PR was to simplify the implementation of dask-ctl discovery in dask-jobqueue in a followup PR 🙂

The problem this PR solves is "How can admins preconfigure user environments with some cluster templates?". The discussion in dask-ctl solves the same problem but for everyone.

The goal of dask-ctl is to be a control plane for all Dask clusters. It wont be the only way to do things but it will be the most consistent and convenient way to do cluster lifecycle things.

I expect in the future that users can still use the individual libraries directly to "pull the cables and move the flaps" if they want to.

I think the main problem this PR solves is actually "how can users preconfigure multiple clusters of the same type without leaving jobqueue?". This is currently not possible using the jobqueue config and adding dask-ctl into the mix is a good (i.e. not ad-hoc) solution. I think not being able to do this is a deficiency of the current jobqueue config that should be solved in jobqueue.

I also think cluster creation will always be simplest by using the library directly. The value in dask-ctl is value add things like discovery, listing, aggregating, templating and single-pane-of-glass views.

My goal is to try and push as much cluster lifecycle logic as possible into dask-ctl so that it is all in one place and is easily reusable between subprojects. I dislike the idea of individual deployment projects implementing their own solutions to this problem, and other problems like it, rather than collaborating on a shared tool like dask-ctl.

We are on a journey with dask-ctl where hopefully one day it will be upstreamed to distributed or made directly available via the dask namespace like dask.distributed is today. But to get there we need folks to use it rather than working independently.

This is exciting! It feels like clearly delineating what dask-ctl and dask-foo should be responsible for (in the ideal case) moving forwards would be useful here. I'm still a bit unsure

My question to @alisterburt is what can we do to make dask-ctl better and reduce friction there? For example I totally sympathise with your point about the use of the keywords args and kwargs in the cluster spec. Maybe we want to change that to something easier to understand in a v2 spec? (this is why I versioned the spec, so we can iterate nicely)

For my own personal use dask-ctl is working wonderfully, I'm happy creating clusters from yaml paths for now and will be happier creating from names once we get dask-contrib/dask-ctl#61 going. As a developer making things for non-python folks most likely working in HPC, I really like the idea of developing against dask-ctl and creating/discovering dask-foo clusters from there.

This is how I imagine the top of programs I will provide to people

from dask_ctl import get_cluster, create_cluster

# cluster_name comes in from the outside world
try:
    cluster = get_cluster(cluster_name)
except NameError:
    cluster = create_cluster(cluster_name)

The friction for my use case comes in when asking users to configure a cluster and they need jobqueue and ctl rather than just jobqueue - I'm optimising to keep the set of things users interact with as minimal as possible

Overall I am happy to go either way in this repo and will PR to dask-ctl soon regardless but I do think there is merit to this including this PR in jobqueue 🙂

jacobtomlinson commented 1 year ago

I think not being able to do this is a deficiency of the current jobqueue config that should be solved in jobqueue.

I think the core of my comments are that I think creating cluster templates is out of scope for dask-{jobqueue,kubernetes,cloudprovider,yarn,etc} and we should be doing this in dask-ctl.

To quote the Zen of Python There should be one– and preferably only one –obvious way to do it..

one of the main reasons for this PR was to simplify the implementation of dask-ctl discovery in dask-jobqueue

I haven't reviewed this PR line by line yet as I wanted to have the high level discussion first. However at a quick glance I don't think this satisfies the dask-ctl expectations for from_name. Two (or more) processes should both be able to call dask_jobqueue.SLURMCluster.from_name("awesomecluster") and all get a pointer to exactly the same cluster. My reading of this PR would be that a new cluster would be spawned each time.

In this PR name seems to be a template name. In a dask-ctl context name is the unique name of a cluster instance.

This is how I imagine the top of programs I will provide to people

That looks like a nice example. In dask-kubernetes we do a check for an existing cluster with that name before trying to create a new one so the code example is further simplified. It would be nice to do this in more places.

from dask_ctl import create_cluster

cluster = create_cluster("/path/to/spec.yaml")  # If 'foo' exists it connects to it, if not it creates it
# /path/to/spec.yaml
version: 1
module: "dask_kubernetes.operator"
class: "KubeCluster"
kwargs:
    name: "foo"

Most people only want a client anyway so a one liner would be

client = create_cluster("/path/to/spec.yaml").get_client()
# They can still access the cluster object at 'client.cluster' if they want to call 'client.cluster.scale(10)' or something

Here's some examples of how you can use KubeCluster.

from dask_kubernetes.operator import KubeCluster

cluster = KubeCluster(name="foo")  # If 'foo' exists it connects to it, if not it creates it (most users want this)

# Users who don't want this can control creation/connection

cluster = KubeCluster.from_name("foo")  # If 'foo' exists it connects to it, if not it raises an exception (this meets dask-ctl expectations)
cluster = KubeCluster(name="foo", create_mode="CONNECT_ONLY")  # Equivalent to 'from_name'

cluster = KubeCluster(name="foo", create_mode="CREATE_ONLY")  # If 'foo' exists it raises an exception, if not it creates it

The friction for my use case comes in when asking users to configure a cluster and they need jobqueue and ctl rather than just jobqueue

I totally understand, ultimately these are just all bits of Dask but we avoid installing them all by default to reduce bloat in the core package. But I understand that separate namespaces can add friction. Do you see this problem being more of an install-time issue or import-time issue? Would making dask-ctl a dependency of dask-jobqueue resolve the install-time friction?

alisterburt commented 1 year ago

I think the core of my comments are that I think creating cluster templates is out of scope for dask-{jobqueue,kubernetes,cloudprovider,yarn,etc} and we should be doing this in dask-ctl.

To quote the Zen of Python There should be one– and preferably only one –obvious way to do it..

Cool! Thank you for being patient, this is clear guidance I can get behind 🙂 let's close this PR and push for realising the ideal ecosystem then.

Two (or more) processes should both be able to call dask_jobqueue.SLURMCluster.from_name("awesomecluster") and all get a pointer to exactly the same cluster. My reading of this PR would be that a new cluster would be spawned each time.

Your reading is correct and I think my understanding of dask here is lacking, if I had multiple users in a HPC environment what are the advantages/disadvantages of having them connect to the same Client/Cluster vs creating their own? Naively I thought we would want users to run their own for resiliency (other users can't crash my thing) and to avoid overwhelming the task graph if there are multiple heavy users.

Thanks for the dask-kubernetes example - that's clean and flexible!

I understand that separate namespaces can add friction. Do you see this problem being more of an install-time issue or import-time issue? Would making dask-ctl a dependency of dask-jobqueue resolve the install-time friction?

Sure - I think ~all friction would be removed if dask-ctl were bundled with jobqueue/dask itself at install time and the jobqueue documentation used dask-ctl for cluster templates rather than its own config. @guillaumeeb would you/the other maintainers here accept PRs in this direction?

jacobtomlinson commented 1 year ago

what are the advantages/disadvantages of having them connect to the same Client/Cluster vs creating their own

It's less about multiple users sharing a single cluster and more about multiple processes sharing a cluster (often sequentially, not at the same time). Workflow managers like Airflow have multiple stages in a pipeline which are each launched as a separate process or allocation with various levels of fan-in and fan-out. It's nice to have the first stage in your pipeline create a Dask cluster and each stage after that reuse the Dask resource and the final stage delete the Dask cluster.

It's also very helpful in some failure modes that are not common on HPC. HPC allocations have a wall time, so if you use dask-jobqueue to create a cluster and the whole thing blows up (and the Python process dies without calling the finalizers via a SIGKILL) the Dask cluster will eventually be reaped by the resource manager, not much harm done. Kubernetes and Cloud don't have any concept of wall time so if you lose the Python process that created your cluster the cleanup is very manual and error-prone. Failures there can be costly if you can't quickly identify left-over Dask clusters and quickly delete them.

With the new KubeCluster implementation I mentioned before you can quickly create a new object that points to your cluster and then call cluster.close() to clean up. This is also a core purpose of dask-ctl, being able to run dask cluster list and dask cluster delete foo is very powerful. Both of these commands depend on being able to recreate a cluster manager object to interact with the cluster.

if dask-ctl were bundled with jobqueue/dask itself at install time and the jobqueue documentation used dask-ctl for cluster templates rather than its own config

I would love to see this. What do you think @guillaumeeb?

alisterburt commented 1 year ago

You're a star @jacobtomlinson - I learned a bunch and the motivation behind centralising this management is now crystal clear, thanks for writing this up!

guillaumeeb commented 1 year ago

if dask-ctl were bundled with jobqueue/dask itself at install time and the jobqueue documentation used dask-ctl for cluster templates rather than its own config

I would love to see this. What do you think @guillaumeeb?

Well, wow, that's a pretty complete and detailed discussion here. I have to admit I didn't spend a lot of time looking at dask-ctl... So my vision here is not yet broad enough. As first thoughts:

jacobtomlinson commented 1 year ago

Should we drop (part of?) the underlying code at some point?

I'm not suggesting removing any of the existing configuration in dask-jobqueue. All the configuration does today is set the default kwargs for the HPCCluster objects which totally makes sense.

My comment was more that the config here shouldn't be expanded to include lifecycle things like cluster templates. Instead we should work together to do this in dask-ctl so that all dask-foo projects can benefit from shared functionality.

guillaumeeb commented 1 year ago

Ok, perfectly fine for me!