Closed consideRatio closed 2 years ago
We have configuration like this for our singleuser environment. I was hoping that we can influence DaskGateway to take a default value in its configuration of the environment like we have done for image. But well... image is a string and environment is a dict/mapping so I think that made this attempt fail.
extraEnv:
# The default worker image matches the singleuser image.
DASK_GATEWAY__CLUSTER__OPTIONS__IMAGE: '{JUPYTER_IMAGE_SPEC}'
DASK_GATEWAY__CLUSTER__OPTIONS__ENVIRONMENT: 'SCRATCH_BUCKET={SCRATCH_BUCKET}'
DASK_DISTRIBUTED__DASHBOARD_LINK: '/user/{JUPYTERHUB_USER}/proxy/{port}/status'
DASK_LABEXTENSION__FACTORY__MODULE: 'dask_gateway'
DASK_LABEXTENSION__FACTORY__CLASS: 'GatewayCluster'
I've learned that the Mapping configuration represented by a widget requires us to write YAML (or JSON being a subset of YAML) so my attempt above was also invalid as I assumed some key=value comma separated list or similar.
DASK_
prefixed environment variables are loaded into config.ast.literal_eval
is used to evaluate the environment variable value, so strings can become Python dictionaries if they want.dask.config.expand_environment_variables
expands nested values also, but turns out it is never invoked.DASK_INTERNAL_INHERIT_CONFIG
was a side track I tried, but it also didn't invoke variable expansion.{MY_ENV}
syntax) via format_template
.
format_template
doesn't expand nested values like we have for our environment dictionary we want to set.DASK_GATEWAY__CLUSTER__OPTIONS__ENVIRONMENT: '{"SCRATCH_BUCKET": "$(SCRATCH_BUCKET)"}'
AAAAAAAAAAAAAAARGH the order matters for the variable expansion by k8s, and we set it as a dictionary and then kubespawner renders it to a list. SCRATCH_BUCKET
becomes defined later DASK_GATEWAY__CLUSTER__OPTIONS__ENVIRONMENT
apparently but after JUPYTERHUB_USER
.
:scream: https://github.com/jupyterhub/kubespawner/issues/491 :scream:
extraEnv:
# About DASK_ prefixed variables we set:
#
# 1. k8s native variable expansion is applied with $(MY_ENV) syntax. The
# order variables are defined matters though and we are under the
# mercy of how KubeSpawner renders our passed dictionaries.
#
# 2. Dask loads local YAML config.
#
# 3. Dask loads environment variables prefixed DASK_.
# - DASK_ is stripped
# - Capitalization is ignored
# - Double underscore means a nested configuration
# - `ast.literal_eval` is used to parse values
#
# 4. dask-gateway and dask-distributed looks at its config and expands
# expressions in {} again, sometimes only with the environment
# variables as context but sometimes also with additional variables.
#
# References:
# - K8s expansion: https://kubernetes.io/docs/tasks/inject-data-application/define-interdependent-environment-variables/
# - KubeSpawner issue: https://github.com/jupyterhub/kubespawner/issues/491
# - Dask config: https://docs.dask.org/en/latest/configuration.html
# - Exploration issue: https://github.com/2i2c-org/pilot-hubs/issues/442
#
DASK_GATEWAY__CLUSTER__OPTIONS__IMAGE: '{JUPYTER_IMAGE_SPEC}'
# Try make this show up before SCRATCH_BUCKET but after JUPYTERHUB_USER
# by naming its entry.
SCRATCH_A:
name: DASK_GATEWAY__CLUSTER__OPTIONS__ENVIRONMENT
value: '{"SCRATCH_BUCKET": "$(JUPYTERHUB_USER)"}'
DASK_DISTRIBUTED__DASHBOARD_LINK: '/user/{JUPYTERHUB_USER}/proxy/{port}/status'
DASK_LABEXTENSION__FACTORY__MODULE: 'dask_gateway'
DASK_LABEXTENSION__FACTORY__CLASS: 'GatewayCluster'
I'm not sure I'm happy getting this to work... :scream:
Well it didn't work. This must be fixed upstream in kubespawner properly now I think, given that we have abandoned the k8s native list format for a dictionary format.
Thanks for digging into this, @consideRatio. If the pangeo images set PANGEO_SCRATCH in their entrypoint, rather than via dask gateway, does this work?
Set it and point to juoyterhub_username?
I dislike the idea of onr more workaround now, kubespawner should order envs to prefer ending up with expanded env vars since we use a dict to set them rather than a list.
This is not the first time i run into this, and it costed me plenty of time both times. Other users will also run into this.
Yeah, this variable expansion will continue to be a thing we'll have to deal with.
For reference, https://github.com/2i2c-org/pilot-hubs/blob/76c15bf6496071e32b7c3da60be936e772047af6/hub-templates/basehub/values.yaml#L304 is how SCRATCH_BUCKET is set on the notebook pods when it's enabled via the config-connector
With JupyterHub 1.1.3 and this issue resolved in KubeSpawner, this is now resolved for the JMTE hub.
This is the logic used that works for us now.
extraEnv:
# About DASK_ prefixed variables we set:
#
# 1. k8s native variable expansion is applied with $(MY_ENV) syntax. The
# order variables are defined matters though and we are under the
# mercy of how KubeSpawner renders our passed dictionaries.
#
# 2. Dask loads local YAML config.
#
# 3. Dask loads environment variables prefixed DASK_.
# - DASK_ is stripped
# - Capitalization is ignored
# - Double underscore means a nested configuration
# - `ast.literal_eval` is used to parse values
#
# 4. dask-gateway and dask-distributed looks at its config and expands
# expressions in {} again, sometimes only with the environment
# variables as context but sometimes also with additional variables.
#
# References:
# - K8s expansion: https://kubernetes.io/docs/tasks/inject-data-application/define-interdependent-environment-variables/
# - KubeSpawner issue: https://github.com/jupyterhub/kubespawner/issues/491
# - Dask config: https://docs.dask.org/en/latest/configuration.html
# - Exploration issue: https://github.com/2i2c-org/pilot-hubs/issues/442
#
DASK_GATEWAY__CLUSTER__OPTIONS__IMAGE: '{JUPYTER_IMAGE_SPEC}'
DASK_GATEWAY__CLUSTER__OPTIONS__ENVIRONMENT: '{"SCRATCH_BUCKET": "$(SCRATCH_BUCKET)"}'
DASK_DISTRIBUTED__DASHBOARD_LINK: '/user/{JUPYTERHUB_USER}/proxy/{port}/status'
DASK_LABEXTENSION__FACTORY__MODULE: 'dask_gateway'
DASK_LABEXTENSION__FACTORY__CLASS: 'GatewayCluster'
Do you want to do something else here, @consideRatio? Or we should just close it?
I figure I'll upstream the relevant parts it to all hubs, creating a PR now.
Summary
When dask-gateway is used to create a dask worker, we don't pass set the SCRATCH_BUCKET environment variable which is probably very relevant also to have set on worker pods.
User Stories
Acceptance criteria
Tasks to complete