Open mrocklin opened 2 years ago
This is already possible today using the defaultURL
setting value. Doing this as part of a deployment would look like:
overrides.json
file for JupyterLab to pick up. This could be baked in to the environment if it's a stable URL, or done as part of some setup script.Clearly, I should put a bit of effort into docs here...
Is there an environment variable that I could set somewhere? See https://github.com/dask/distributed/pull/6737 for context
On Mon, Jul 18, 2022 at 11:30 AM Ian Rose @.***> wrote:
This is already possible today using the defaultURL setting value. Doing this as part of a deployment would look like:
- Identify the relevant server address
- Prior to users loading the page (not necessarily prior to the jupyter server startup, but might as well be), put the setting value in an overrides.json https://jupyterlab.readthedocs.io/en/stable/user/directories.html#overrides-json file for JupyterLab to pick up. This could be baked in to the environment if it's a stable URL, or done as part of some setup script.
Clearly, I should put a bit of effort into docs here...
— Reply to this email directly, view it on GitHub https://github.com/dask/dask-labextension/issues/239#issuecomment-1187712197, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTEIHMFDUBRTCU7SZWDVUWBA5ANCNFSM53YYX24Q . You are receiving this because you authored the thread.Message ID: @.***>
Not with the current design -- the default URL to populate the search bar with is decided on the frontend, and feeding information to that goes through the config system (i.e., env variables aren't directly visible to the frontend).
Is there an issue with writing a small config file in that case, or is it just more convenient to set an env variable?
So I would do something like the following before starting up the Jupyter server?
with open("overrides.json", mode="w") as f:
f.write(json.dumps(...))
Yes, something like that, at least for a proof-of-concept. A more complete solution might be to use json5 and merge with other possible config options.
To be clear, we could have some kind of translation layer between the dask config system and the JupyterLab one, but we'd have to build it. I'm a little reticent to build out a new set of special-case environment variable rather than go through the existing path. I know that some JupyterHubs/QHubs/2i2c-deployments also have needs to distribute custom settings.
The frontend chooses in order:
I also noticed when kicking the tires on this that the user-populated URL can be a bit too sticky at the moment (you can reset it with a ?reset
query parameter). A fix for that is pretty straightforward here.
@ian-r-rose and I spoke. There is some possibility of using the system that currently sends the default at-start-time clusters up to the frontend. This is low enough priority though that we're going to wait until jupyter-on-dask becomes more of a major thing (maybe never).
If we switched out the internals for dask-ctl
this would be handled automatically by the cluster discovery. Discovered clusters would be listed automatically in the sidebar. xref #189
Recall Jacob that in this case we don't have any Cluster objects, just a scheduler address
On Tue, Jul 19, 2022 at 7:54 AM Jacob Tomlinson @.***> wrote:
If we switched out the internals for dask-ctl this would be handled automatically by the cluster discovery. Discovered clusters would be listed automatically in the sidebar. xref #189 https://github.com/dask/dask-labextension/issues/189
— Reply to this email directly, view it on GitHub https://github.com/dask/dask-labextension/issues/239#issuecomment-1189017939, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTFUNA5BK7XEWRPDM2LVU2QSFANCNFSM53YYX24Q . You are receiving this because you authored the thread.Message ID: @.***>
we don't have any Cluster objects, just a scheduler address
I am not sure that this would be insurmountable in a refactor to use dask-ctl
. Today, the sidebar in some sense owns the clusters listed there, and they are backed by real Cluster
instances. But if we can, I'd love to get out of the business of having a Cluster
backed object all-together, and just have something like "here is a list of clusters we know how to connect to". In that case maybe an address (+ some related metadata?) would be enough.
@mrocklin that should be fine. dask_ctl.ProxyCluster
fulfils the Cluster
API and is useful for representing clusters that can't be rehydrated into other cluster manager objects. Currently, the discovery method for ProxyCluster
has a look through open ports on localhost
and if it finds schedulers it returns them. So classes like LocalCluster
and SSHCluster
can be included in the list. It would be very quick to expand this to include other addresses configured in the environment like the DASK_SCHEDULER_ADDRESS
.
But if we can, I'd love to get out of the business of having a
Cluster
backed object all-together
I've been down the same thought process too. The trouble is the cluster objects are generally the only place that we can actually represent the abstract concept of a cluster, Dask Gateway and the Dask Kubernetes Operator both have other ways to store and represent this internally, but most other deployment mechanism's don't. My goal with ProxyCluster
is to hold this representation in a catch-all way for clusters that aren't easily put back into their original classes.
My goal with ProxyCluster is to hold this representation in a catch-all way for clusters that aren't easily put back into their original classes.
This seems it could bee a good solution -- thanks for the explanation @jacobtomlinson. I'll see if I can put together an example using dask/distributed#6737 and ProxyCluster
.
I'm getting more excited about the possibility of integrating dask-ctl
here
I'm also interested in providing a default address.
I tried the following in overrides.json
but it doesn't seem to work. Maybe I'm using the wrong plugin name?
{
"dask-labextension:plugin": {
"hideClusterManager": true,
"defaultURL": "<hidden>"
}
}
Thanks for your help.
So, I'm in an interesting situation where I'm running a Jupyter server and I know that it will have exactly one Dask cluster attached to it. I would like to populate the Dask labextension with that scheduler address on startup. Is this easy to do?