Specify default address to look for scheduler

mrocklin commented 2 years ago

So, I'm in an interesting situation where I'm running a Jupyter server and I know that it will have exactly one Dask cluster attached to it. I would like to populate the Dask labextension with that scheduler address on startup. Is this easy to do?

ian-r-rose commented 2 years ago

This is already possible today using the defaultURL setting value. Doing this as part of a deployment would look like:

Identify the relevant server address
Prior to users loading the page (not necessarily prior to the jupyter server startup, but might as well be), put the setting value in an overrides.json file for JupyterLab to pick up. This could be baked in to the environment if it's a stable URL, or done as part of some setup script.

Clearly, I should put a bit of effort into docs here...

mrocklin commented 2 years ago

Is there an environment variable that I could set somewhere? See https://github.com/dask/distributed/pull/6737 for context

On Mon, Jul 18, 2022 at 11:30 AM Ian Rose @.***> wrote:

This is already possible today using the defaultURL setting value. Doing this as part of a deployment would look like:

Identify the relevant server address

Prior to users loading the page (not necessarily prior to the jupyter server startup, but might as well be), put the setting value in an overrides.json https://jupyterlab.readthedocs.io/en/stable/user/directories.html#overrides-json file for JupyterLab to pick up. This could be baked in to the environment if it's a stable URL, or done as part of some setup script.

Clearly, I should put a bit of effort into docs here...

— Reply to this email directly, view it on GitHub https://github.com/dask/dask-labextension/issues/239#issuecomment-1187712197, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTEIHMFDUBRTCU7SZWDVUWBA5ANCNFSM53YYX24Q . You are receiving this because you authored the thread.Message ID: @.***>

ian-r-rose commented 2 years ago

Not with the current design -- the default URL to populate the search bar with is decided on the frontend, and feeding information to that goes through the config system (i.e., env variables aren't directly visible to the frontend).

Is there an issue with writing a small config file in that case, or is it just more convenient to set an env variable?

mrocklin commented 2 years ago

So I would do something like the following before starting up the Jupyter server?

with open("overrides.json", mode="w") as f:
    f.write(json.dumps(...))

ian-r-rose commented 2 years ago

Yes, something like that, at least for a proof-of-concept. A more complete solution might be to use json5 and merge with other possible config options.

To be clear, we could have some kind of translation layer between the dask config system and the JupyterLab one, but we'd have to build it. I'm a little reticent to build out a new set of special-case environment variable rather than go through the existing path. I know that some JupyterHubs/QHubs/2i2c-deployments also have needs to distribute custom settings.

ian-r-rose commented 2 years ago

The frontend chooses in order:

Any user-populated URL (which is persisted between page refreshes)
The default URL from the settings

I also noticed when kicking the tires on this that the user-populated URL can be a bit too sticky at the moment (you can reset it with a ?reset query parameter). A fix for that is pretty straightforward here.

mrocklin commented 2 years ago

@ian-r-rose and I spoke. There is some possibility of using the system that currently sends the default at-start-time clusters up to the frontend. This is low enough priority though that we're going to wait until jupyter-on-dask becomes more of a major thing (maybe never).

jacobtomlinson commented 2 years ago

If we switched out the internals for dask-ctl this would be handled automatically by the cluster discovery. Discovered clusters would be listed automatically in the sidebar. xref #189

mrocklin commented 2 years ago

Recall Jacob that in this case we don't have any Cluster objects, just a scheduler address

On Tue, Jul 19, 2022 at 7:54 AM Jacob Tomlinson @.***> wrote:

If we switched out the internals for dask-ctl this would be handled automatically by the cluster discovery. Discovered clusters would be listed automatically in the sidebar. xref #189 https://github.com/dask/dask-labextension/issues/189

— Reply to this email directly, view it on GitHub https://github.com/dask/dask-labextension/issues/239#issuecomment-1189017939, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTFUNA5BK7XEWRPDM2LVU2QSFANCNFSM53YYX24Q . You are receiving this because you authored the thread.Message ID: @.***>

ian-r-rose commented 2 years ago

we don't have any Cluster objects, just a scheduler address

I am not sure that this would be insurmountable in a refactor to use dask-ctl. Today, the sidebar in some sense owns the clusters listed there, and they are backed by real Cluster instances. But if we can, I'd love to get out of the business of having a Cluster backed object all-together, and just have something like "here is a list of clusters we know how to connect to". In that case maybe an address (+ some related metadata?) would be enough.

jacobtomlinson commented 2 years ago

@mrocklin that should be fine. dask_ctl.ProxyCluster fulfils the Cluster API and is useful for representing clusters that can't be rehydrated into other cluster manager objects. Currently, the discovery method for ProxyCluster has a look through open ports on localhost and if it finds schedulers it returns them. So classes like LocalCluster and SSHCluster can be included in the list. It would be very quick to expand this to include other addresses configured in the environment like the DASK_SCHEDULER_ADDRESS.

But if we can, I'd love to get out of the business of having a Cluster backed object all-together

I've been down the same thought process too. The trouble is the cluster objects are generally the only place that we can actually represent the abstract concept of a cluster, Dask Gateway and the Dask Kubernetes Operator both have other ways to store and represent this internally, but most other deployment mechanism's don't. My goal with ProxyCluster is to hold this representation in a catch-all way for clusters that aren't easily put back into their original classes.

ian-r-rose commented 2 years ago

My goal with ProxyCluster is to hold this representation in a catch-all way for clusters that aren't easily put back into their original classes.

This seems it could bee a good solution -- thanks for the explanation @jacobtomlinson. I'll see if I can put together an example using dask/distributed#6737 and ProxyCluster.

I'm getting more excited about the possibility of integrating dask-ctl here

Lesterpig commented 2 years ago

I'm also interested in providing a default address.

I tried the following in overrides.json but it doesn't seem to work. Maybe I'm using the wrong plugin name?

{
        "dask-labextension:plugin": {
                "hideClusterManager": true,
                "defaultURL": "<hidden>"
        }
}

Thanks for your help.

dask / dask-labextension

Specify default address to look for scheduler #239