Open tiborkiss opened 2 years ago
What version of the helm chart is installed? Is it the latest release released yesterday?
Check by inspecting labels on the dask-gateway pods for example.
Oh daskhub, okay hmm then you should be still using the old version and that means i didnt break something yesterday.
Hmmm, unsure what has went wrong here, but just ruled out a regression.
it is dask-gateway-2022.6.1
app.kubernetes.io/component=traefik
app.kubernetes.io/instance=daskhub
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=dask-gateway
app.kubernetes.io/version=2022.6.1
gateway.dask.org/instance=daskhub-dask-gateway
helm.sh/chart=dask-gateway-2022.6.1
pod-template-hash=d7cc865bc
In traefik-daskhub-dask-gateway pod I see this in the log:
time="2022-10-14T09:19:04Z" level=info msg="Configuration loaded from flags."
time="2022-10-14T09:19:04Z" level=warning msg="Cross-namespace reference between IngressRoutes and resources is enabled, please ensure that this is expected (see AllowCrossNamespace option)" providerName=kubernetescrd
time="2022-10-14T09:19:04Z" level=error msg="subset not found for default/api-daskhub-dask-gateway" providerName=kubernetescrd namespace=default ingress=api-daskhub-dask-gateway
time="2022-10-14T09:19:06Z" level=error msg="subset not found for default/api-daskhub-dask-gateway" ingress=api-daskhub-dask-gateway namespace=default providerName=kubernetescrd
time="2022-10-14T09:34:03Z" level=error msg="subset not found for default/api-daskhub-dask-gateway" namespace=default providerName=kubernetescrd ingress=api-daskhub-dask-gateway
In jupyter-admin pod:
[I 2022-10-14 09:22:16.339 SingleUserLabApp mixins:648] Starting jupyterhub-singleuser server version 2.3.1
[W 2022-10-14 09:22:16.344 SingleUserLabApp _version:68] jupyterhub version 1.5.0 != jupyterhub-singleuser version 2.3.1. This could cause failure to authenticate and result in redirect loops!
[I 2022-10-14 09:22:16.344 SingleUserLabApp serverapp:2726] Serving notebooks from local directory: /home/jovyan
[I 2022-10-14 09:22:16.344 SingleUserLabApp serverapp:2726] Jupyter Server 1.18.1 is running at:
Everything else looks normal.
Have you made an update to the helm chart as part of observing this change?
You may have made an upgrade of daskhub without adjusting to the breaking changes in dask-gateway that was upgraded at some point in daskhub. See https://gateway.dask.org/changelog.html#id12.
Since this is just a PoC state, I recreated everything, including helm repo remove dask
. I checked in AWS console that everything is removed, including the jupyter hub image, then recreate from scratch.
I checked the version of helm in my console is 3.9.0, etc. I think that "breaking changes" are not the case here.
Anyway, thank you for the tips. I have to admit that I am not a k8s and nor helm charts specialist, therefore any hints are helpful. I just search the clean, repeatable solution, to capture where are the risk points to break the system during development without intent. Therefore right now I am just zapping and recreating.. then later, the backend team will take-over with contiunous operations etc. Until now, I have recreated from scratch, probably 3 times and I had no issue, then after one week holiday I came back and now it has this.
I have an EKS based Dask setup, which was working fine two weeks ago. Yestarday when I return to continue my work, the call of GatewayCluster() already throws ClientResponseError: 405, message='Method Not Allowed', url=URL('http://proxy-public/services/dask-gateway/api/v1/clusters/')
Minimal Complete Verifiable Example:
and throws this
Two weeks ago was working fine. The Kubernetes cluster starts fine, I can also login in jupyter lab. I noticed a significant difference compared to the two weeks run, when I open a terminal in jupyter lab, there I don't have anymore the aws client. Two weeks ago was there.
Environment: Here is my daskub.yaml.. of course I removed the secrets.
I tried with pangeo docker image verion 2022.09.21, which I picked from https://github.com/pangeo-data/pangeo-docker-images/tags. Exactly the same result.
Dask version:
Python version: Python 3.9.13
Operating System: what's in pangeo-docker image
Install method (conda, pip, source): The daskhub.yaml file you can see above.
Helm chart installed into EKS cluster with autoscaler, as described here https://github.com/awslabs/amazon-asdi/tree/main/examples/eks . (Some changes, simplifications I have which is related to EBS CSI driver and added some permissions to have access to my private s3.)