canonical / bundle-kubeflow

Charmed Kubeflow
Apache License 2.0
104 stars 50 forks source link

Connection to Jupyter Notebook throws "not a valid page" #839

Closed natalytvinova closed 8 months ago

natalytvinova commented 8 months ago

Bug Description

Hi team,

I successfully created a Jupyter Notebook, logs are presented bellow. But unfortunately I get this when trying to connect to it: image

To Reproduce

  1. juju deploy bundle 1.8
  2. juju refresh istio-pilot --channel latest/edge/pr-381 --trust --config default-gateway=kubeflow
  3. juju refresh oidc-gatekeeper --channel latest/edge/pr-135 --trust
  4. juju config dex-auth public-url=https://my.domain.com
  5. juju config oidc-gatekeeper public-url=https://my.domain.com
  6. juju config istio-pilot domain-name=my.domain.com
  7. juju deploy self-signed-certificates
  8. juju relate self-signed-certificates istio-pilot
  9. apply workaround for this bug https://github.com/canonical/admission-webhook-operator/issues/126
  10. create a jupyter-notebook

Environment

Kubeflow bundle 1.8 Juju 3.1.7 Charmed Kubernetes 1.28

Relevant Log Output

s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
cont-init: info: running /etc/cont-init.d/01-copy-tmp-home
cont-init: info: /etc/cont-init.d/01-copy-tmp-home exited 0
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service legacy-services: starting
services-up: info: copying legacy longrun jupyterlab (no readiness notification)
s6-rc: info: service legacy-services successfully started
[W 2024-02-23 08:35:05.568 ServerApp] ServerApp.token config is deprecated in 2.0. Use IdentityProvider.token.
[I 2024-02-23 08:35:05.581 ServerApp] Package jupyterlab took 0.0000s to import
[I 2024-02-23 08:35:05.586 ServerApp] Package jupyter_server_fileid took 0.0036s to import
[I 2024-02-23 08:35:05.589 ServerApp] Package jupyter_server_mathjax took 0.0021s to import
[I 2024-02-23 08:35:05.598 ServerApp] Package jupyter_server_terminals took 0.0078s to import
[I 2024-02-23 08:35:05.633 ServerApp] Package jupyter_server_ydoc took 0.0345s to import
[I 2024-02-23 08:35:05.666 ServerApp] Package jupyterlab_git took 0.0325s to import
[I 2024-02-23 08:35:05.667 ServerApp] Package nbclassic took 0.0000s to import
[W 2024-02-23 08:35:05.669 ServerApp] A `_jupyter_server_extension_points` function was not found in nbclassic. Instead, a `_jupyter_server_extension_paths` function was found and will be used for now. This function name will be deprecated in future releases of Jupyter Server.
[I 2024-02-23 08:35:05.670 ServerApp] Package nbdime took 0.0000s to import
[I 2024-02-23 08:35:05.670 ServerApp] Package notebook_shim took 0.0000s to import
[W 2024-02-23 08:35:05.670 ServerApp] A `_jupyter_server_extension_points` function was not found in notebook_shim. Instead, a `_jupyter_server_extension_paths` function was found and will be used for now. This function name will be deprecated in future releases of Jupyter Server.
[I 2024-02-23 08:35:05.676 ServerApp] jupyter_server_fileid | extension was successfully linked.
[I 2024-02-23 08:35:05.680 ServerApp] jupyter_server_mathjax | extension was successfully linked.
[I 2024-02-23 08:35:05.684 ServerApp] jupyter_server_terminals | extension was successfully linked.
[I 2024-02-23 08:35:05.689 ServerApp] jupyter_server_ydoc | extension was successfully linked.
[I 2024-02-23 08:35:05.694 ServerApp] jupyterlab | extension was successfully linked.
[I 2024-02-23 08:35:05.694 ServerApp] jupyterlab_git | extension was successfully linked.
[I 2024-02-23 08:35:05.698 ServerApp] nbclassic | extension was successfully linked.
[I 2024-02-23 08:35:05.698 ServerApp] nbdime | extension was successfully linked.
[I 2024-02-23 08:35:05.699 ServerApp] Writing Jupyter server cookie secret to /home/jovyan/.local/share/jupyter/runtime/jupyter_cookie_secret
[I 2024-02-23 08:35:05.983 ServerApp] notebook_shim | extension was successfully linked.
[W 2024-02-23 08:35:06.164 ServerApp] All authentication is disabled.  Anyone who can connect to this server will be able to run code.
[I 2024-02-23 08:35:06.165 ServerApp] notebook_shim | extension was successfully loaded.
[I 2024-02-23 08:35:06.165 FileIdExtension] Configured File ID manager: ArbitraryFileIdManager
[I 2024-02-23 08:35:06.165 FileIdExtension] ArbitraryFileIdManager : Configured root dir: /home/jovyan
[I 2024-02-23 08:35:06.166 FileIdExtension] ArbitraryFileIdManager : Configured database path: /home/jovyan/.local/share/jupyter/file_id_manager.db
[I 2024-02-23 08:35:06.166 FileIdExtension] ArbitraryFileIdManager : Successfully connected to database file.
[I 2024-02-23 08:35:06.166 FileIdExtension] ArbitraryFileIdManager : Creating File ID tables and indices with journal_mode = DELETE
[I 2024-02-23 08:35:06.299 FileIdExtension] Attached event listeners.
[I 2024-02-23 08:35:06.300 ServerApp] jupyter_server_fileid | extension was successfully loaded.
[I 2024-02-23 08:35:06.301 ServerApp] jupyter_server_mathjax | extension was successfully loaded.
[I 2024-02-23 08:35:06.305 ServerApp] jupyter_server_terminals | extension was successfully loaded.
[I 2024-02-23 08:35:06.306 ServerApp] jupyter_server_ydoc | extension was successfully loaded.
[I 2024-02-23 08:35:06.311 LabApp] JupyterLab extension loaded from /opt/conda/lib/python3.11/site-packages/jupyterlab
[I 2024-02-23 08:35:06.311 LabApp] JupyterLab application directory is /opt/conda/share/jupyter/lab
[I 2024-02-23 08:35:06.317 ServerApp] jupyterlab | extension was successfully loaded.
[I 2024-02-23 08:35:06.326 ServerApp] jupyterlab_git | extension was successfully loaded.
[I 2024-02-23 08:35:06.331 ServerApp] nbclassic | extension was successfully loaded.
[I 2024-02-23 08:35:06.442 ServerApp] nbdime | extension was successfully loaded.
[I 2024-02-23 08:35:06.443 ServerApp] Serving notebooks from local directory: /home/jovyan
[I 2024-02-23 08:35:06.443 ServerApp] Jupyter Server 2.9.1 is running at:
[I 2024-02-23 08:35:06.443 ServerApp] http://uat-2-0:8888/notebook/admin/uat-2/lab
[I 2024-02-23 08:35:06.443 ServerApp]     http://127.0.0.1:8888/notebook/admin/uat-2/lab
[I 2024-02-23 08:35:06.443 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation)

Additional Context

No response

syncronize-issues-to-jira[bot] commented 8 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5372.

This message was autogenerated

kimwnasptd commented 8 months ago

@natalytvinova this looks like an issue with networking.

First of all, if you open you browser's dev tools and go on the network tab, what's the response of the request that tries to hit /notebook/admin/uat-2/lab? Is it a 404?

Then, could you provide us with the output of kubectl get virtualservices -n admin -o yaml? If it's a 404 then most probably it's related to the Istio Gateway and the VirtualService that registers that this path should be redirected to the Notebook's K8s Service

natalytvinova commented 8 months ago

Hi @kimwnasptd In the dev-tools the only thing failing is: GET http://my.domain.com/api/metrics 405 (Method Not Allowed)

Meanwhile the main request returns 200:

Request URL:
http://my.domain.com/jupyter/api/namespaces/admin/notebooks
Request Method:
GET
Status Code:
200 OK

But what I also get in jupyter-ui pod logs (container jupyter-ui):

2024-02-26T06:56:25.274Z [pebble] Check "up" failure 7921 (threshold 3): received non-20x status code 401
2024-02-26T06:56:55.266Z [jupyter-ui] 2024-02-26 06:56:55,265 | kubeflow.kubeflow.crud_backend.errors.handlers | ERROR | HTTP Exception handled: 401 Unauthorized: No user detected.
2024-02-26T06:56:55.269Z [pebble] Check "up" failure 7922 (threshold 3): received non-20x status code 401
2024-02-26T06:56:55.269Z [jupyter-ui] 127.0.0.1 - - [26/Feb/2024:06:56:55 +0000] "GET / HTTP/1.1" 401 69 "-" "Go-http-client/1.1"

Here is a virtualservices yaml:

apiVersion: v1
items:
- apiVersion: networking.istio.io/v1beta1
  kind: VirtualService
  metadata:
    creationTimestamp: "2024-02-23T12:31:43Z"
    generation: 1
    name: notebook-admin-no-vol
    namespace: admin
    ownerReferences:
    - apiVersion: kubeflow.org/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: Notebook
      name: no-vol
      uid: 2ae69814-3d1d-4c01-ad35-0bcf5de0b3fe
    resourceVersion: "62773663"
    uid: 213c97c8-2f77-468b-8816-125817ed86e3
  spec:
    gateways:
    - kubeflow/kubeflow-gateway
    hosts:
    - '*'
    http:
    - headers:
        request:
          set: {}
      match:
      - uri:
          prefix: /notebook/admin/no-vol/
      rewrite:
        uri: /notebook/admin/no-vol/
      route:
      - destination:
          host: no-vol.admin.svc.cluster.local
          port:
            number: 80
- apiVersion: networking.istio.io/v1beta1
  kind: VirtualService
  metadata:
    creationTimestamp: "2024-02-23T13:08:25Z"
    generation: 1
    name: notebook-admin-tyest
    namespace: admin
    ownerReferences:
    - apiVersion: kubeflow.org/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: Notebook
      name: tyest
      uid: 0517acfc-8600-41ea-96c3-d3ff4f7b6e6d
    resourceVersion: "62802573"
    uid: 5edb6893-6605-4f8f-8e86-1295c1a76956
  spec:
    gateways:
    - kubeflow/kubeflow-gateway
    hosts:
    - '*'
    http:
    - headers:
        request:
          set: {}
      match:
      - uri:
          prefix: /notebook/admin/tyest/
      rewrite:
        uri: /notebook/admin/tyest/
      route:
      - destination:
          host: tyest.admin.svc.cluster.local
          port:
            number: 80
- apiVersion: networking.istio.io/v1beta1
  kind: VirtualService
  metadata:
    creationTimestamp: "2024-02-23T08:29:00Z"
    generation: 1
    name: notebook-admin-uat
    namespace: admin
    ownerReferences:
    - apiVersion: kubeflow.org/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: Notebook
      name: uat
      uid: 145059af-7eae-4b00-9d74-ea41e5bee7a8
    resourceVersion: "62607516"
    uid: 5e935a22-f789-4c71-82cf-7723dea404cb
  spec:
    gateways:
    - kubeflow/kubeflow-gateway
    hosts:
    - '*'
    http:
    - headers:
        request:
          set: {}
      match:
      - uri:
          prefix: /notebook/admin/uat/
      rewrite:
        uri: /notebook/admin/uat/
      route:
      - destination:
          host: uat.admin.svc.cluster.local
          port:
            number: 80
- apiVersion: networking.istio.io/v1beta1
  kind: VirtualService
  metadata:
    creationTimestamp: "2024-02-23T08:33:10Z"
    generation: 1
    name: notebook-admin-uat-2
    namespace: admin
    ownerReferences:
    - apiVersion: kubeflow.org/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: Notebook
      name: uat-2
      uid: 59f63b8e-8f41-414a-980e-e68db67eeea5
    resourceVersion: "62610487"
    uid: f9a4ebb4-59b1-4e9a-b6a3-bc00e62b3968
  spec:
    gateways:
    - kubeflow/kubeflow-gateway
    hosts:
    - '*'
    http:
    - headers:
        request:
          set: {}
      match:
      - uri:
          prefix: /notebook/admin/uat-2/
      rewrite:
        uri: /notebook/admin/uat-2/
      route:
      - destination:
          host: uat-2.admin.svc.cluster.local
          port:
            number: 80
kind: List
metadata:
  resourceVersion: ""
natalytvinova commented 8 months ago

After a debug session with @kimwnasptd we found that kubeflow-gateway gateway name was actually kubeflow instead of kubeflow-gateway and some of the virtualservices we looking at the actual kubeflow gateway like kubeflow-dashboard.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  creationTimestamp: "2024-02-01T16:24:31Z"
  generation: 2
  labels:
    app.juju.is/created-by: istio-pilot
    app.kubernetes.io/instance: istio-pilot-kubeflow
    kubernetes-resource-handler-scope: ingress
  name: kubeflow-dashboard
  namespace: kubeflow
  resourceVersion: "59745310"
  uid: 33b236f3-b1dc-447d-a5bc-d1442ff71da4
spec:
  gateways:
  - kubeflow/kubeflow
  hosts:
  - '*'
  http:
  - match:
    - uri:
        prefix: /
    rewrite:
      uri: /
    route:
    - destination:
        host: kubeflow-dashboard.kubeflow.svc.cluster.local
        port:
          number: 8082

But others like the notebooks were using kubeflow-gateway

ubuntu@infra-1-medma:~$ kubectl get virtualservices -n admin  -o yaml
apiVersion: v1
items:
- apiVersion: networking.istio.io/v1beta1
  kind: VirtualService
  metadata:
    creationTimestamp: "2024-02-23T12:31:43Z"
    generation: 1
    name: notebook-admin-no-vol
    namespace: admin
    ownerReferences:
    - apiVersion: kubeflow.org/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: Notebook
      name: no-vol
      uid: 2ae69814-3d1d-4c01-ad35-0bcf5de0b3fe
    resourceVersion: "62773663"
    uid: 213c97c8-2f77-468b-8816-125817ed86e3
  spec:
    gateways:
    - kubeflow/kubeflow-gateway
    hosts:
    - '*'
    http:
    - headers:
        request:
          set: {}
      match:
      - uri:
          prefix: /notebook/admin/no-vol/
      rewrite:
        uri: /notebook/admin/no-vol/
      route:
      - destination:
          host: no-vol.admin.svc.cluster.local
          port:
            number: 80

Juju config of the default-gateway for istio-pilot was set to "kubeflow". After changing it to "kubeflow-gateway" all the names were aligned and the access got restored

DnPlas commented 8 months ago

Hey @natalytvinova thanks for raising this, and thanks @kimwnasptd for the follow up.

Just to add more context and prevent this issue form happening in the future. The istio-pilot charm has a configuration option called default-gateway which is used for naming the Gateway resource, that is, whenever you deploy both istio-operators, the Gateway that gets created receives the name that you set in that option. By default, this value is istio-gateway.

In a Kubeflow deployment, many applications assume the Gateway name to be kubeflow-gateway, and that is why in the CKF bundle defintion, we set it to that value. It is very important that this value doesn't change in your CKF deployment because while some applications will be able to catch the change (like the dashboard did), user workloads (like the notebook you created) won't be able to catch this change and will just assume that there is a Gateway in the kubeflow namespace called kubeflow-gateway. This behaviour is the same upstream, we cannot really do much about it, btw.

That being said, this doesn't seem like an issue, but rather a misconfiguration, so I'm closing it. Feel free to re-open if you think this is still an issue.