canonical / notebook-operators

Charmed Jupyter Notebooks
Apache License 2.0
5 stars 9 forks source link

Jupyter-ui cannot create a new notebook #360

Closed orfeas-k closed 4 months ago

orfeas-k commented 4 months ago

Bug Description

Deployed kubeflow bundle from latest/edge and jupyter-ui fails to create a notebook in those two different cases.

Error 400

  1. New notebook page
  2. Write notebook name
  3. Hit create This results in the following error
    {
    "log": "No value provided for: tolerationGroup",
    "status": 400,
    "success": false,
    "user": "admin"
    }

    and those logs are produced in jupyter-ui container: logs

Error 500

  1. New notebook page
  2. Write notebook name
  3. Select None as Affinity and tolerations
  4. Hit create
    {
    "log": "An error occured in the backend.",
    "status": 500,
    "success": false,
    "user": "admin"
    }

    and those logs are produced in jupyter-ui container: logs

All logs from jupyter-ui container

To Reproduce

Deploy CKF from latest/edge in below dependencies and create a notebook.

Environment

AWS EC2 instance m5.4xlarge, 64GB, 16 CPU Juju 3.1.8-genericlinux-amd64 MicroK8s v1.26.14 revision 6576 Juju status

╰─$ juju status 
Model     Controller  Cloud/Region        Version  SLA          Timestamp
kubeflow  microk8s    microk8s/localhost  3.1.8    unsupported  15:00:04Z

SAAS                             Status  Store           URL
grafana-dashboards               active  cos-controller  admin/cos.grafana-dashboards
prometheus-receive-remote-write  active  cos-controller  admin/cos.prometheus-receive-remote-write

App                        Version                  Status  Scale  Charm                    Channel              Rev  Address         Exposed  Message
admission-webhook                                   active      1  admission-webhook        latest/edge          298  10.152.183.35   no       
argo-controller                                     active      1  argo-controller          latest/edge          453  10.152.183.214  no       
dex-auth                                            active      1  dex-auth                 latest/edge          450  10.152.183.202  no       
envoy                                               active      1  envoy                    latest/edge          173  10.152.183.171  no       
grafana-agent-k8s          0.35.2                   active      1  grafana-agent-k8s        edge                  72  10.152.183.156  no       logging-consumer: off
istio-ingressgateway                                active      1  istio-gateway            latest/edge          837  10.152.183.100  no       
istio-pilot                                         active      1  istio-pilot              latest/edge          813  10.152.183.170  no       
jupyter-controller                                  active      1  jupyter-controller       latest/edge          921  10.152.183.243  no       
jupyter-ui                                          active      1  jupyter-ui               latest/edge          842  10.152.183.72   no       
katib-controller                                    active      1  katib-controller         latest/edge          520  10.152.183.127  no       
katib-db                   8.0.36-0ubuntu0.22.04.1  active      1  mysql-k8s                8.0/edge             137  10.152.183.206  no       
katib-db-manager                                    active      1  katib-db-manager         latest/edge          484  10.152.183.63   no       
katib-ui                                            active      1  katib-ui                 latest/edge          495  10.152.183.248  no       
kfp-api                                             active      1  kfp-api                  latest/edge         1173  10.152.183.110  no       
kfp-db                     8.0.36-0ubuntu0.22.04.1  active      1  mysql-k8s                8.0/edge             137  10.152.183.147  no       
kfp-metadata-writer                                 active      1  kfp-metadata-writer      latest/edge/pr-436   264  10.152.183.113  no       
kfp-persistence                                     active      1  kfp-persistence          latest/edge         1180  10.152.183.43   no       
kfp-profile-controller                              active      1  kfp-profile-controller   latest/edge         1138  10.152.183.71   no       
kfp-schedwf                                         active      1  kfp-schedwf              latest/edge         1192  10.152.183.64   no       
kfp-ui                                              active      1  kfp-ui                   latest/edge         1175  10.152.183.165  no       
kfp-viewer                                          active      1  kfp-viewer               latest/edge         1205  10.152.183.65   no       
kfp-viz                                             active      1  kfp-viz                  latest/edge         1126  10.152.183.49   no       
knative-eventing                                    active      1  knative-eventing         latest/edge          369  10.152.183.74   no       
knative-operator                                    active      1  knative-operator         latest/edge          344  10.152.183.59   no       
knative-serving                                     active      1  knative-serving          latest/edge          370  10.152.183.240  no       
kserve-controller                                   active      1  kserve-controller        latest/edge          519  10.152.183.249  no       
kubeflow-dashboard                                  active      1  kubeflow-dashboard       latest/edge/pr-186   501  10.152.183.205  no       
kubeflow-profiles                                   active      1  kubeflow-profiles        latest/edge          362  10.152.183.70   no       
kubeflow-roles                                      active      1  kubeflow-roles           latest/edge          197  10.152.183.84   no       
kubeflow-volumes                                    active      1  kubeflow-volumes         latest/edge          279  10.152.183.181  no       
metacontroller-operator                             active      1  metacontroller-operator  latest/edge          268  10.152.183.111  no       
minio                      res:oci-image@1755999    active      1  minio                    latest/edge          292  10.152.183.148  no       
mlmd                                                active      1  mlmd                     latest/edge          160  10.152.183.158  no       
oidc-gatekeeper                                     active      1  oidc-gatekeeper          latest/edge          371  10.152.183.231  no       
pvcviewer-operator                                  active      1  pvcviewer-operator       latest/edge           63  10.152.183.126  no       
seldon-controller-manager                           active      1  seldon-core              latest/edge          670  10.152.183.18   no       
tensorboard-controller                              active      1  tensorboard-controller   latest/edge          266  10.152.183.133  no       
tensorboards-web-app                                active      1  tensorboards-web-app     latest/edge          254  10.152.183.138  no       
training-operator                                   active      1  training-operator        latest/edge          346  10.152.183.233  no  

Relevant Log Output

above

Additional Context

No response

syncronize-issues-to-jira[bot] commented 4 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5604.

This message was autogenerated

ca-scribner commented 4 months ago

The most likely cause is this pr. I wonder if we now pass the incorrect "null" value when a notebook wants none of the available tolerations, etc.

For who triages this, use the minimal deployment defined in 345's testing instructions. On that deployment, try (no need to try all the below, just try until you can find the problem):