Closed jmunroe closed 1 month ago
I think it could be good to try draw a line between setting starting dask-gateway clusters etc and working against it. Setting up a dask-gateway cluster for a user in a 2i2c hub means to use a dask-gateway client to request its creation and various details.
Using the dask-gateway created dask cluster shouldn't require dask-gateway specific details, its just typical dask work against a scheduler + workers - they just happened to be created via dask-gateway.
The docs of relevance can be grouped in:
Here is code from input cells + screenshots of output cells in a jupyter notebook I use to test dask-gateway function as its setup in 2i2c hubs. This can be tested via https://dask-staging.2i2c.cloud.
# Create a gateway object to speak with dask-gateway,
# which in turn can create the dask cluster for you.
#
from dask_gateway import Gateway
gateway = Gateway()
# Request information about the options you can configure
# on a to-be-created dask cluster.
#
# All options are optional.
#
options = gateway.cluster_options()
options
# Now let's create a cluster. After running this cell, you get
# a control panel view to add/remote workers. Manually add at
# least one.
#
# If a new server needs to be started, it will take take ~5 minutes
# for it to register and update the numbers of workers.
#
cluster = gateway.new_cluster(options)
cluster
cluster.shutdown()
Sorry I've been unable to make progress as I have just only finished work on https://github.com/2i2c-org/team-compass/pull/859. This item on documenting dask-gateway will be committed to my next sprint today.
Thank you Erik for providing guidance on scope and context. I don't have a great deal of experience using dask-gateway, so these notes are appreciated.
2024-05-09. @jnywong is serving as shepherd on this one. This will likely be addressed in the next sprint. This is not critical but needs to be written. This can be pulled into next iteration so that an associated FreshDesk ticket can be closed.
@choldgraf asks where will the documentation appear as SSOT? The mirrored documentation in the 2i2c site will be dropped in favor fo the docs site.
EDIT: Ah I've just figured out that the 'Configurator' was set to quay.io/jupyter/scipy-notebook:2024-03-18
. Does this need to be changed back?
ORIGINAL POST:
The workflow above worked for me last week, but I am now having trouble reproducing this workflow on https://dask-staging.2i2c.cloud/.
I think the configs specify pangeo/pangeo-notebooks:latest
for the image to pull, however on the hub I get the quay.io/jupyter/scipy-notebook:2024-03-18
image, which does not have dask-gateway
installed.
jovyan@jupyter-jwong-402i2c-2eorg:~$ env | grep IMAGE
JUPYTER_IMAGE=quay.io/jupyter/scipy-notebook:2024-03-18
DASK_GATEWAY__CLUSTER__OPTIONS__IMAGE={JUPYTER_IMAGE_SPEC}
JUPYTER_IMAGE_SPEC=quay.io/jupyter/scipy-notebook:2024-03-18
Currently working around this by manually specifying a custom image on the Community Showcase Hub instead.
@jnywong I reset the dask-staging hub to use pangeo/pangeo-notebook:latest again, I figure that makes sense for that hub to use! I think it could have been me that updated it to scipy and forgot to change it back, sorry for the trouble!
Reference material for some technical context as to why we resource dask clusters the way we do: https://github.com/2i2c-org/infrastructure/issues/2687
We have a support question regarding dask-gateway: https://2i2c.freshdesk.com/a/tickets/1502
This is suggest that we need to update/review our documentation on dask-gateway.
In Review/Waiting
Review of first draft requested in PR https://github.com/2i2c-org/docs/pull/228.