canonical / bundle-kubeflow

Charmed Kubeflow
Apache License 2.0
97 stars 47 forks source link

UATs are failing with 401 in one-click installation #951

Open kimwnasptd opened 2 days ago

kimwnasptd commented 2 days ago

Bug Description

After installing CKF in Azure with one-click deployment then some of the UATs are failing with 401 errors.

Specifically the tests that fail are the ones that try to trigger a Pipeline, so there's a high chance that something is happening with the ServiceAccount that gets injected to the Notebooks (maybe the PodDefault is never created?)

To Reproduce

  1. Deploy CKF on Azure with one-click
  2. Run the UATs

Environment

CKF 1.8 on Azure

Relevant Log Output

https://pastebin.canonical.com/p/wMJ4TbQQPj/

Additional Context

No response

syncronize-issues-to-jira[bot] commented 2 days ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5915.

This message was autogenerated

misohu commented 19 hours ago

Environment: juju 3.4.4, aks v1.28.9, kubeflow 1.8/stable

I have created the oneclick deployment. I made sure the components are active.

Model     Controller  Cloud/Region       Version  SLA          Timestamp
kubeflow  manual      k8s-cloud/uksouth  3.4.4    unsupported  08:38:56Z

SAAS        Status  Store  URL
grafana     active  local  admin/cos.grafana
prometheus  active  local  admin/cos.prometheus

App                        Version                  Status  Scale  Charm                    Channel          Rev  Address       Exposed  Message
admission-webhook                                   active      1  admission-webhook        1.8/stable       301  10.0.236.178  no       
argo-controller                                     active      1  argo-controller          3.3.10/stable    424  10.0.6.83     no       
dex-auth                                            active      1  dex-auth                 2.36/stable      422  10.0.236.34   no       
envoy                      res:oci-image@cc06b3e    active      1  envoy                    2.0/stable       194  10.0.183.212  no       
grafana-agent-k8s          0.40.4                   active      1  grafana-agent-k8s        latest/edge       80  10.0.200.220  no       logging-consumer: off
istio-ingressgateway                                active      1  istio-gateway            1.17/stable     1000  10.0.180.252  no       
istio-pilot                                         active      1  istio-pilot              1.17/stable     1011  10.0.24.151   no       
jupyter-controller                                  active      1  jupyter-controller       1.8/stable       849  10.0.237.229  no       
jupyter-ui                                          active      1  jupyter-ui               1.8/stable       858  10.0.121.7    no       
katib-controller           res:oci-image@31ccd70    active      1  katib-controller         0.16/stable      576  10.0.33.4     no       
katib-db                   8.0.36-0ubuntu0.22.04.1  active      1  mysql-k8s                8.0/stable       153  10.0.59.17    no       
katib-db-manager                                    active      1  katib-db-manager         0.16/stable      539  10.0.54.131   no       
katib-ui                                            active      1  katib-ui                 0.16/stable      422  10.0.193.116  no       
kfp-api                                             active      1  kfp-api                  2.0/stable      1283  10.0.162.128  no       
kfp-db                     8.0.36-0ubuntu0.22.04.1  active      1  mysql-k8s                8.0/stable       153  10.0.7.99     no       
kfp-metadata-writer                                 active      1  kfp-metadata-writer      2.0/stable       334  10.0.96.190   no       
kfp-persistence                                     active      1  kfp-persistence          2.0/stable      1291  10.0.228.27   no       
kfp-profile-controller                              active      1  kfp-profile-controller   2.0/stable      1315  10.0.54.172   no       
kfp-schedwf                                         active      1  kfp-schedwf              2.0/stable      1466  10.0.211.104  no       
kfp-ui                                              active      1  kfp-ui                   2.0/stable      1285  10.0.236.8    no       
kfp-viewer                                          active      1  kfp-viewer               2.0/stable      1317  10.0.20.142   no       
kfp-viz                                             active      1  kfp-viz                  2.0/stable      1235  10.0.195.109  no       
knative-eventing                                    active      1  knative-eventing         1.10/stable      353  10.0.15.30    no       
knative-operator                                    active      1  knative-operator         1.10/stable      328  10.0.30.18    no       
knative-serving                                     active      1  knative-serving          1.10/stable      409  10.0.235.243  no       
kserve-controller                                   active      1  kserve-controller        0.11/stable      523  10.0.133.162  no       
kubeflow-dashboard                                  active      1  kubeflow-dashboard       1.8/stable       582  10.0.46.190   no       
kubeflow-profiles                                   active      1  kubeflow-profiles        1.8/stable       355  10.0.240.56   no       
kubeflow-roles                                      active      1  kubeflow-roles           1.8/stable       187  10.0.247.9    no       
kubeflow-volumes           res:oci-image@2261827    active      1  kubeflow-volumes         1.8/stable       260  10.0.37.40    no       
metacontroller-operator                             active      1  metacontroller-operator  3.0/stable       252  10.0.185.47   no       
minio                      res:oci-image@1755999    active      1  minio                    ckf-1.8/stable   278  10.0.55.52    no       
mlflow-mysql               8.0.36-0ubuntu0.22.04.1  active      1  mysql-k8s                8.0/stable       153  10.0.232.225  no       
mlflow-server                                       active      1  mlflow-server            2.1/stable       466  10.0.117.149  no       
mlmd                       res:oci-image@44abc5d    active      1  mlmd                     1.14/stable      127  10.0.79.195   no       
oidc-gatekeeper                                     active      1  oidc-gatekeeper          ckf-1.8/stable   350  10.0.111.155  no       
pvcviewer-operator                                  active      1  pvcviewer-operator       1.8/stable        30  10.0.146.105  no       
resource-dispatcher                                 active      1  resource-dispatcher      1.0/stable        93  10.0.81.91    no       
seldon-controller-manager                           active      1  seldon-core              1.17/stable      664  10.0.2.218    no       
tensorboard-controller                              active      1  tensorboard-controller   1.8/stable       257  10.0.137.178  no       
tensorboards-web-app                                active      1  tensorboards-web-app     1.8/stable       245  10.0.190.230  no       
training-operator                                   active      1  training-operator        1.7/stable       347  10.0.239.121  no       

Unit                          Workload  Agent  Address      Ports          Message
admission-webhook/0*          active    idle   10.244.1.19                 
argo-controller/0*            active    idle   10.244.1.15                 
dex-auth/0*                   active    idle   10.244.0.24                 
envoy/0*                      active    idle   10.244.0.36  9090,9901/TCP  
grafana-agent-k8s/0*          active    idle   10.244.0.20                 logging-consumer: off
istio-ingressgateway/0*       active    idle   10.244.1.17                 
istio-pilot/0*                active    idle   10.244.1.13                 
jupyter-controller/0*         active    idle   10.244.1.9                  
jupyter-ui/0*                 active    idle   10.244.0.15                 
katib-controller/0*           active    idle   10.244.1.32  443,8080/TCP   
katib-db-manager/0*           active    idle   10.244.2.10                 
katib-db/0*                   active    idle   10.244.1.16                 Primary
katib-ui/0*                   active    idle   10.244.2.14                 
kfp-api/0*                    active    idle   10.244.1.10                 
kfp-db/0*                     active    idle   10.244.2.19                 Primary
kfp-metadata-writer/0*        active    idle   10.244.1.20                 
kfp-persistence/0*            active    idle   10.244.0.14                 
kfp-profile-controller/0*     active    idle   10.244.2.16                 
kfp-schedwf/0*                active    idle   10.244.1.8                  
kfp-ui/0*                     active    idle   10.244.0.19                 
kfp-viewer/0*                 active    idle   10.244.2.12                 
kfp-viz/0*                    active    idle   10.244.2.11                 
knative-eventing/0*           active    idle   10.244.1.14                 
knative-operator/0*           active    idle   10.244.1.21                 
knative-serving/0*            active    idle   10.244.1.7                  
kserve-controller/0*          active    idle   10.244.0.16                 
kubeflow-dashboard/0*         active    idle   10.244.2.8                  
kubeflow-profiles/0*          active    idle   10.244.1.11                 
kubeflow-roles/0*             active    idle   10.244.0.12                 
kubeflow-volumes/2*           active    idle   10.244.2.6   5000/TCP       
metacontroller-operator/0*    active    idle   10.244.1.18                 
minio/0*                      active    idle   10.244.2.24  9000-9001/TCP  
mlflow-mysql/0*               active    idle   10.244.0.22                 Primary
mlflow-server/0*              active    idle   10.244.0.21                 
mlmd/0*                       active    idle   10.244.2.23  8080/TCP       
oidc-gatekeeper/0*            active    idle   10.244.2.15                 
pvcviewer-operator/0*         active    idle   10.244.1.12                 
resource-dispatcher/0*        active    idle   10.244.0.17                 
seldon-controller-manager/0*  active    idle   10.244.2.13                 
tensorboard-controller/0*     active    idle   10.244.2.7                  
tensorboards-web-app/0*       active    idle   10.244.0.23                 
training-operator/0*          active    idle   10.244.0.18

Then I have setup python3.8 with tox

sudo add-apt-repository ppa:deadsnakes/ppa -y
sudo apt update -y
sudo apt install python3.8 python3.8-distutils python3.8-venv -y
wget https://bootstrap.pypa.io/get-pip.py
python3.8 get-pip.py
python3.8 -m pip install tox
export PATH=$PATH:/home/ubuntu/.local/bin
tox --version

Then I have cloned UATs repo and run the UATs from main branch

git clone https://github.com/canonical/charmed-kubeflow-uats.git
cd charmed-kubeflow-uats/
git checkout main 
tox -e kubeflow-remote

Test passed

platform linux -- Python 3.8.10, pytest-8.2.2, pluggy-1.5.0 -- /opt/conda/bin/python3.8
cachedir: .pytest_cache
rootdir: /tests/.worktrees/fe86b4e255c4c695376f70061a6a645301350d5a/tests
configfile: pytest.ini
plugins: anyio-3.6.2
collecting ... collected 9 items / 4 deselected / 5 selected

test_notebooks.py::test_notebook[katib-integration] 
-------------------------------- live log call ---------------------------------
INFO     test_notebooks:test_notebooks.py:44 Running katib-integration.ipynb...
PASSED                                                                   [ 20%]
test_notebooks.py::test_notebook[kfp-v1-integration] 
-------------------------------- live log call ---------------------------------
INFO     test_notebooks:test_notebooks.py:44 Running kfp-v1-integration.ipynb...
PASSED                                                                   [ 40%]
test_notebooks.py::test_notebook[kfp-v2-integration] 
-------------------------------- live log call ---------------------------------
INFO     test_notebooks:test_notebooks.py:44 Running kfp-v2-integration.ipynb...
PASSED                                                                   [ 60%]
test_notebooks.py::test_notebook[kserve-integration] 
-------------------------------- live log call ---------------------------------
INFO     test_notebooks:test_notebooks.py:44 Running kserve-integration.ipynb...
PASSED                                                                   [ 80%]
test_notebooks.py::test_notebook[training-integration] 
-------------------------------- live log call ---------------------------------
INFO     test_notebooks:test_notebooks.py:44 Running training-integration.ipynb...
PASSED                                                                   [100%]

================= 5 passed, 4 deselected in 1087.81s (0:18:07) =================
PASSED
--------------------------------------------------------------------------------------------- live log teardown ----------------------------------------------------------------------------------------------
INFO     test_kubeflow_workloads:test_kubeflow_workloads.py:82 Deleting Profile test-kubeflow...
INFO     httpx:_client.py:1013 HTTP Request: DELETE https://kubeflow-n577396o.hcp.uksouth.azmk8s.io/apis/kubeflow.org/v1/profiles/test-kubeflow "HTTP/1.1 200 OK"
INFO     test_kubeflow_workloads:test_kubeflow_workloads.py:141 Deleting Job test-kubeflow/test-kubeflow...
INFO     httpx:_client.py:1013 HTTP Request: DELETE https://kubeflow-n577396o.hcp.uksouth.azmk8s.io/apis/batch/v1/namespaces/test-kubeflow/jobs/test-kubeflow "HTTP/1.1 200 OK"

======================================================================================= 2 passed in 1154.07s (0:19:14) =======================================================================================
  kubeflow-remote: OK (1178.19=setup[22.82]+cmd[1155.36] seconds)
  congratulations :) (1178.33 seconds)

Here is the pods log showing no problems

ubuntu@vu34wtsmbwx56BootstrapVm:~$ kubectl get po -n test-kubeflow --watch
NAME                                              READY   STATUS            RESTARTS   AGE
ml-pipeline-ui-artifact-6b89ccc469-2b72n          2/2     Running           0          48s
ml-pipeline-visualizationserver-955b54775-nkvg8   0/2     PodInitializing   0          48s
test-kubeflow-dx6vv                               2/2     Running           0          49s
cmaes-example-cmaes-75f5b9d5dd-r8m8k              0/1     Pending           0          0s
cmaes-example-cmaes-75f5b9d5dd-r8m8k              0/1     Pending           0          0s
cmaes-example-cmaes-75f5b9d5dd-r8m8k              0/1     ContainerCreating   0          0s
cmaes-example-cmaes-75f5b9d5dd-r8m8k              0/1     Running             0          5s
cmaes-example-cmaes-75f5b9d5dd-r8m8k              1/1     Running             0          21s
cmaes-example-4qnb4dgb-6fzmt                      0/2     Pending             0          0s
cmaes-example-4qnb4dgb-6fzmt                      0/2     Pending             0          0s
cmaes-example-bzqd6kft-f6rvw                      0/2     Pending             0          0s
cmaes-example-bzqd6kft-f6rvw                      0/2     Pending             0          0s
cmaes-example-4qnb4dgb-6fzmt                      0/2     ContainerCreating   0          0s
cmaes-example-bzqd6kft-f6rvw                      0/2     ContainerCreating   0          0s
ml-pipeline-visualizationserver-955b54775-nkvg8   1/2     Running             0          111s
ml-pipeline-visualizationserver-955b54775-nkvg8   2/2     Running             0          111s
cmaes-example-4qnb4dgb-6fzmt                      2/2     Running             0          42s
cmaes-example-bzqd6kft-f6rvw                      2/2     Running             0          45s
cmaes-example-bzqd6kft-f6rvw                      1/2     NotReady            0          90s
cmaes-example-bzqd6kft-f6rvw                      0/2     Completed           0          92s
cmaes-example-bzqd6kft-f6rvw                      0/2     Completed           0          93s
cmaes-example-bzqd6kft-f6rvw                      0/2     Completed           0          94s
cmaes-example-bzqd6kft-f6rvw                      0/2     Completed           0          94s
cmaes-example-bzqd6kft-f6rvw                      0/2     Terminating         0          94s
cmaes-example-bzqd6kft-f6rvw                      0/2     Terminating         0          94s
cmaes-example-xqr2mc8k-6gn27                      0/2     Pending             0          0s
cmaes-example-xqr2mc8k-6gn27                      0/2     Pending             0          0s
cmaes-example-xqr2mc8k-6gn27                      0/2     ContainerCreating   0          0s
cmaes-example-4qnb4dgb-6fzmt                      1/2     NotReady            0          95s
cmaes-example-4qnb4dgb-6fzmt                      0/2     Completed           0          96s
cmaes-example-xqr2mc8k-6gn27                      2/2     Running             0          2s
cmaes-example-4qnb4dgb-6fzmt                      0/2     Completed           0          98s
cmaes-example-4qnb4dgb-6fzmt                      0/2     Completed           0          98s
cmaes-example-4qnb4dgb-6fzmt                      0/2     Completed           0          99s
cmaes-example-4qnb4dgb-6fzmt                      0/2     Terminating         0          99s
cmaes-example-4qnb4dgb-6fzmt                      0/2     Terminating         0          99s
cmaes-example-xqr2mc8k-6gn27                      1/2     NotReady            0          58s
cmaes-example-xqr2mc8k-6gn27                      0/2     Completed           0          60s
cmaes-example-xqr2mc8k-6gn27                      0/2     Completed           0          61s
cmaes-example-xqr2mc8k-6gn27                      0/2     Completed           0          62s
cmaes-example-xqr2mc8k-6gn27                      0/2     Completed           0          62s
cmaes-example-xqr2mc8k-6gn27                      0/2     Terminating         0          62s
cmaes-example-xqr2mc8k-6gn27                      0/2     Terminating         0          62s
cmaes-example-cmaes-75f5b9d5dd-r8m8k              1/1     Terminating         0          2m58s
cmaes-example-cmaes-75f5b9d5dd-r8m8k              0/1     Terminating         0          2m59s
cmaes-example-cmaes-75f5b9d5dd-r8m8k              0/1     Terminating         0          3m
cmaes-example-cmaes-75f5b9d5dd-r8m8k              0/1     Terminating         0          3m
cmaes-example-cmaes-75f5b9d5dd-r8m8k              0/1     Terminating         0          3m
calculation-pipeline-wkrc2-4050137206             0/2     Pending             0          0s
calculation-pipeline-wkrc2-4050137206             0/2     Pending             0          0s
calculation-pipeline-wkrc2-4050137206             0/2     Init:0/1            0          0s
calculation-pipeline-wkrc2-3212673545             0/2     Pending             0          0s
calculation-pipeline-wkrc2-3212673545             0/2     Pending             0          0s
calculation-pipeline-wkrc2-3212673545             0/2     Init:0/1            0          0s
calculation-pipeline-wkrc2-4050137206             0/2     Init:0/1            0          0s
calculation-pipeline-wkrc2-3212673545             0/2     Init:0/1            0          0s
calculation-pipeline-wkrc2-4050137206             0/2     Init:0/1            0          8s
calculation-pipeline-wkrc2-3212673545             0/2     Init:0/1            0          9s
calculation-pipeline-wkrc2-4050137206             0/2     PodInitializing     0          9s
calculation-pipeline-wkrc2-3212673545             0/2     PodInitializing     0          10s
calculation-pipeline-wkrc2-4050137206             2/2     Running             0          29s
calculation-pipeline-wkrc2-3212673545             2/2     Running             0          30s
calculation-pipeline-wkrc2-4050137206             2/2     Running             0          30s
calculation-pipeline-wkrc2-4050137206             2/2     Running             0          31s
calculation-pipeline-wkrc2-3212673545             2/2     Running             0          31s
calculation-pipeline-wkrc2-3212673545             2/2     Running             0          31s
calculation-pipeline-wkrc2-3212673545             0/2     Completed           0          35s
calculation-pipeline-wkrc2-4050137206             0/2     Completed           0          35s
calculation-pipeline-wkrc2-3212673545             0/2     Completed           0          37s
calculation-pipeline-wkrc2-4050137206             0/2     Completed           0          37s
calculation-pipeline-wkrc2-4050137206             0/2     Completed           0          37s
calculation-pipeline-wkrc2-3212673545             0/2     Completed           0          37s
calculation-pipeline-wkrc2-3195895926             0/2     Pending             0          0s
calculation-pipeline-wkrc2-3195895926             0/2     Pending             0          0s
calculation-pipeline-wkrc2-3195895926             0/2     Init:0/1            0          0s
calculation-pipeline-wkrc2-4050137206             0/2     Completed           0          39s
calculation-pipeline-wkrc2-3212673545             0/2     Completed           0          39s
calculation-pipeline-wkrc2-3195895926             0/2     Init:0/1            0          0s
calculation-pipeline-wkrc2-3195895926             0/2     Init:0/1            0          1s
calculation-pipeline-wkrc2-3195895926             0/2     PodInitializing     0          2s
calculation-pipeline-wkrc2-3195895926             2/2     Running             0          3s
calculation-pipeline-wkrc2-3195895926             1/2     NotReady            0          4s
calculation-pipeline-wkrc2-3195895926             1/2     NotReady            0          4s
calculation-pipeline-wkrc2-3195895926             1/2     NotReady            0          5s
calculation-pipeline-wkrc2-3195895926             0/2     Completed           0          5s
calculation-pipeline-wkrc2-3195895926             0/2     Completed           0          6s
calculation-pipeline-wkrc2-3195895926             0/2     Completed           0          7s
calculation-pipeline-wkrc2-3195895926             0/2     Completed           0          10s
condition-v2-x6h2p-502777903                      0/2     Pending             0          0s
condition-v2-x6h2p-502777903                      0/2     Pending             0          0s
condition-v2-x6h2p-502777903                      0/2     Init:0/1            0          0s
condition-v2-x6h2p-502777903                      0/2     PodInitializing     0          1s
condition-v2-x6h2p-502777903                      2/2     Running             0          6s
condition-v2-x6h2p-502777903                      2/2     Running             0          7s
condition-v2-x6h2p-502777903                      0/2     Completed           0          7s
condition-v2-x6h2p-502777903                      0/2     Completed           0          9s
condition-v2-x6h2p-502777903                      0/2     Completed           0          9s
condition-v2-x6h2p-3683981472                     0/2     Pending             0          0s
condition-v2-x6h2p-3683981472                     0/2     Pending             0          0s
condition-v2-x6h2p-3683981472                     0/2     Init:0/1            0          0s
condition-v2-x6h2p-502777903                      0/2     Completed           0          10s
condition-v2-x6h2p-3683981472                     0/2     PodInitializing     0          1s
condition-v2-x6h2p-3683981472                     2/2     Running             0          2s
condition-v2-x6h2p-3683981472                     1/2     NotReady            0          3s
condition-v2-x6h2p-3683981472                     1/2     NotReady            0          4s
condition-v2-x6h2p-3683981472                     0/2     Completed           0          4s
condition-v2-x6h2p-3683981472                     0/2     Completed           0          6s
condition-v2-x6h2p-3683981472                     0/2     Completed           0          6s
condition-v2-x6h2p-135267782                      0/2     Pending             0          0s
condition-v2-x6h2p-135267782                      0/2     Pending             0          0s
condition-v2-x6h2p-135267782                      0/2     Init:0/2            0          0s
condition-v2-x6h2p-3683981472                     0/2     Completed           0          10s
condition-v2-x6h2p-135267782                      0/2     Init:1/2            0          1s
condition-v2-x6h2p-135267782                      0/2     Init:1/2            0          5s
condition-v2-x6h2p-135267782                      0/2     PodInitializing     0          6s
condition-v2-x6h2p-135267782                      2/2     Running             0          7s
condition-v2-x6h2p-135267782                      0/2     Completed           0          14s
condition-v2-x6h2p-135267782                      0/2     Completed           0          16s
condition-v2-x6h2p-135267782                      0/2     Completed           0          16s
condition-v2-x6h2p-884988224                      0/2     Pending             0          0s
condition-v2-x6h2p-884988224                      0/2     Pending             0          0s
condition-v2-x6h2p-884988224                      0/2     Init:0/1            0          0s
condition-v2-x6h2p-756913840                      0/2     Pending             0          0s
condition-v2-x6h2p-756913840                      0/2     Pending             0          0s
condition-v2-x6h2p-756913840                      0/2     Init:0/1            0          0s
condition-v2-x6h2p-135267782                      0/2     Completed           0          25s
condition-v2-x6h2p-884988224                      0/2     PodInitializing     0          2s
condition-v2-x6h2p-884988224                      2/2     Running             0          3s
condition-v2-x6h2p-884988224                      1/2     NotReady            0          4s
condition-v2-x6h2p-884988224                      1/2     NotReady            0          4s
condition-v2-x6h2p-884988224                      0/2     Completed           0          5s
condition-v2-x6h2p-884988224                      0/2     Completed           0          6s
condition-v2-x6h2p-884988224                      0/2     Completed           0          7s
condition-v2-x6h2p-756913840                      0/2     Init:0/1            0          8s
condition-v2-x6h2p-884988224                      0/2     Completed           0          10s
condition-v2-x6h2p-756913840                      0/2     PodInitializing     0          10s
condition-v2-x6h2p-756913840                      2/2     Running             0          14s
condition-v2-x6h2p-756913840                      2/2     Running             0          15s
condition-v2-x6h2p-756913840                      0/2     Completed           0          16s
condition-v2-x6h2p-756913840                      0/2     Completed           0          17s
condition-v2-x6h2p-756913840                      0/2     Completed           0          18s
condition-v2-x6h2p-3477408950                     0/2     Pending             0          0s
condition-v2-x6h2p-3477408950                     0/2     Pending             0          0s
condition-v2-x6h2p-3477408950                     0/2     Init:0/2            0          0s
condition-v2-x6h2p-756913840                      0/2     Completed           0          20s
condition-v2-x6h2p-3477408950                     0/2     Init:1/2            0          2s
condition-v2-x6h2p-3477408950                     0/2     PodInitializing     0          3s
condition-v2-x6h2p-3477408950                     2/2     Running             0          4s
condition-v2-x6h2p-3477408950                     0/2     Completed           0          11s
condition-v2-x6h2p-3477408950                     0/2     Completed           0          12s
condition-v2-x6h2p-3477408950                     0/2     Completed           0          13s
condition-v2-x6h2p-3477408950                     0/2     Completed           0          21s
sklearn-iris-predictor-00001-deployment-6f6f6bc97b-4qmp8   0/2     Pending             0          0s
sklearn-iris-predictor-00001-deployment-6f6f6bc97b-4qmp8   0/2     Pending             0          0s
sklearn-iris-predictor-00001-deployment-6f6f6bc97b-4qmp8   0/2     Init:0/1            0          0s
sklearn-iris-predictor-00001-deployment-6f6f6bc97b-4qmp8   0/2     Init:0/1            0          15s
sklearn-iris-predictor-00001-deployment-6f6f6bc97b-4qmp8   0/2     PodInitializing     0          26s
sklearn-iris-predictor-00001-deployment-6f6f6bc97b-4qmp8   1/2     Running             0          46s
sklearn-iris-predictor-00001-deployment-6f6f6bc97b-4qmp8   2/2     Running             0          60s
sklearn-iris-predictor-00001-deployment-6f6f6bc97b-4qmp8   2/2     Terminating         0          62s
mnist-chief-0                                              0/1     Pending             0          0s
mnist-chief-0                                              0/1     Pending             0          0s
mnist-chief-0                                              0/1     ContainerCreating   0          0s
mnist-ps-0                                                 0/1     Pending             0          0s
mnist-ps-0                                                 0/1     Pending             0          0s
mnist-ps-0                                                 0/1     ContainerCreating   0          0s
mnist-worker-0                                             0/1     Pending             0          0s
mnist-worker-0                                             0/1     Pending             0          0s
mnist-worker-0                                             0/1     ContainerCreating   0          0s
mnist-worker-1                                             0/1     Pending             0          0s
mnist-worker-1                                             0/1     Pending             0          0s
mnist-worker-1                                             0/1     ContainerCreating   0          0s
sklearn-iris-predictor-00001-deployment-6f6f6bc97b-4qmp8   1/2     Terminating         0          90s
sklearn-iris-predictor-00001-deployment-6f6f6bc97b-4qmp8   0/2     Terminating         0          94s
sklearn-iris-predictor-00001-deployment-6f6f6bc97b-4qmp8   0/2     Terminating         0          94s
sklearn-iris-predictor-00001-deployment-6f6f6bc97b-4qmp8   0/2     Terminating         0          94s
sklearn-iris-predictor-00001-deployment-6f6f6bc97b-4qmp8   0/2     Terminating         0          94s
mnist-chief-0                                              1/1     Running             0          27s
mnist-worker-0                                             1/1     Running             0          27s
mnist-ps-0                                                 1/1     Running             0          28s
mnist-worker-1                                             1/1     Running             0          29s
mnist-worker-0                                             0/1     Completed           0          2m36s
mnist-chief-0                                              0/1     Completed           0          2m37s
mnist-worker-0                                             0/1     Completed           0          2m38s
mnist-chief-0                                              0/1     Completed           0          2m38s
mnist-worker-0                                             0/1     Completed           0          2m38s
mnist-chief-0                                              0/1     Completed           0          2m39s
mnist-worker-0                                             0/1     Terminating         0          3m
mnist-worker-1                                             1/1     Terminating         0          3m
mnist-chief-0                                              0/1     Terminating         0          3m
mnist-chief-0                                              0/1     Terminating         0          3m
mnist-ps-0                                                 1/1     Terminating         0          3m
mnist-worker-0                                             0/1     Terminating         0          3m
pytorch-mnist-gloo-master-0                                0/1     Pending             0          0s
pytorch-mnist-gloo-master-0                                0/1     Pending             0          0s
pytorch-mnist-gloo-master-0                                0/1     ContainerCreating   0          0s
pytorch-mnist-gloo-worker-0                                0/1     Pending             0          0s
pytorch-mnist-gloo-worker-0                                0/1     Pending             0          0s
pytorch-mnist-gloo-worker-0                                0/1     Init:0/1            0          0s
mnist-ps-0                                                 0/1     Terminating         0          3m3s
mnist-ps-0                                                 0/1     Terminating         0          3m3s
mnist-ps-0                                                 0/1     Terminating         0          3m3s
mnist-ps-0                                                 0/1     Terminating         0          3m3s
mnist-worker-1                                             0/1     Terminating         0          3m3s
mnist-worker-1                                             0/1     Terminating         0          3m4s
mnist-worker-1                                             0/1     Terminating         0          3m4s
mnist-worker-1                                             0/1     Terminating         0          3m4s
pytorch-mnist-gloo-master-0                                1/1     Running             0          78s
pytorch-mnist-gloo-worker-0                                0/1     Init:0/1            0          80s
pytorch-mnist-gloo-worker-0                                0/1     PodInitializing     0          85s
pytorch-mnist-gloo-worker-0                                1/1     Running             0          86s
pytorch-mnist-gloo-worker-0                                0/1     Completed           0          4m26s
pytorch-mnist-gloo-master-0                                0/1     Completed           0          4m27s
pytorch-mnist-gloo-worker-0                                0/1     Completed           0          4m27s
pytorch-mnist-gloo-worker-0                                0/1     Completed           0          4m28s
pytorch-mnist-gloo-master-0                                0/1     Completed           0          4m28s
pytorch-mnist-gloo-master-0                                0/1     Completed           0          4m29s
pytorch-mnist-gloo-worker-0                                0/1     Terminating         0          4m30s
pytorch-mnist-gloo-master-0                                0/1     Terminating         0          4m30s
pytorch-mnist-gloo-worker-0                                0/1     Terminating         0          4m30s
pytorch-mnist-gloo-master-0                                0/1     Terminating         0          4m30s
paddle-simple-cpu-worker-0                                 0/1     Pending             0          0s
paddle-simple-cpu-worker-0                                 0/1     Pending             0          0s
paddle-simple-cpu-worker-0                                 0/1     ContainerCreating   0          0s
paddle-simple-cpu-worker-1                                 0/1     Pending             0          0s
paddle-simple-cpu-worker-1                                 0/1     Pending             0          0s
paddle-simple-cpu-worker-1                                 0/1     ContainerCreating   0          0s
paddle-simple-cpu-worker-1                                 1/1     Running             0          78s
paddle-simple-cpu-worker-0                                 1/1     Running             0          79s
paddle-simple-cpu-worker-0                                 0/1     Completed           0          96s
paddle-simple-cpu-worker-1                                 0/1     Completed           0          96s
paddle-simple-cpu-worker-0                                 0/1     Completed           0          98s
paddle-simple-cpu-worker-1                                 0/1     Completed           0          98s
paddle-simple-cpu-worker-1                                 0/1     Completed           0          98s
paddle-simple-cpu-worker-0                                 0/1     Completed           0          98s
paddle-simple-cpu-worker-1                                 0/1     Terminating         0          2m
paddle-simple-cpu-worker-0                                 0/1     Terminating         0          2m
paddle-simple-cpu-worker-1                                 0/1     Terminating         0          2m
paddle-simple-cpu-worker-0                                 0/1     Terminating         0          2m
test-kubeflow-dx6vv                                        1/2     NotReady            0          18m
test-kubeflow-dx6vv                                        0/2     Completed           0          19m
test-kubeflow-dx6vv                                        0/2     Completed           0          19m
test-kubeflow-dx6vv                                        0/2     Completed           0          19m
test-kubeflow-dx6vv                                        0/2     Completed           0          19m
test-kubeflow-dx6vv                                        0/2     Completed           0          19m
calculation-pipeline-wkrc2-3195895926                      0/2     Terminating         0          14m
calculation-pipeline-wkrc2-3195895926                      0/2     Terminating         0          14m
calculation-pipeline-wkrc2-3212673545                      0/2     Terminating         0          14m
calculation-pipeline-wkrc2-3212673545                      0/2     Terminating         0          14m
calculation-pipeline-wkrc2-4050137206                      0/2     Terminating         0          14m
calculation-pipeline-wkrc2-4050137206                      0/2     Terminating         0          14m
condition-v2-x6h2p-135267782                               0/2     Terminating         0          13m
condition-v2-x6h2p-135267782                               0/2     Terminating         0          13m
condition-v2-x6h2p-3477408950                              0/2     Terminating         0          12m
condition-v2-x6h2p-3477408950                              0/2     Terminating         0          12m
condition-v2-x6h2p-3683981472                              0/2     Terminating         0          13m
condition-v2-x6h2p-3683981472                              0/2     Terminating         0          13m
condition-v2-x6h2p-502777903                               0/2     Terminating         0          13m
condition-v2-x6h2p-502777903                               0/2     Terminating         0          13m
condition-v2-x6h2p-756913840                               0/2     Terminating         0          13m
condition-v2-x6h2p-756913840                               0/2     Terminating         0          13m
condition-v2-x6h2p-884988224                               0/2     Terminating         0          13m
condition-v2-x6h2p-884988224                               0/2     Terminating         0          13m
ml-pipeline-ui-artifact-6b89ccc469-2b72n                   2/2     Terminating         0          19m
ml-pipeline-visualizationserver-955b54775-nkvg8            2/2     Terminating         0          19m
test-kubeflow-dx6vv                                        0/2     Terminating         0          19m
test-kubeflow-dx6vv                                        0/2     Terminating         0          19m
ml-pipeline-visualizationserver-955b54775-nkvg8            0/2     Terminating         0          19m
ml-pipeline-ui-artifact-6b89ccc469-2b72n                   0/2     Terminating         0          19m
ml-pipeline-ui-artifact-6b89ccc469-2b72n                   0/2     Terminating         0          19m
ml-pipeline-ui-artifact-6b89ccc469-2b72n                   0/2     Terminating         0          19m
ml-pipeline-ui-artifact-6b89ccc469-2b72n                   0/2     Terminating         0          19m
ml-pipeline-visualizationserver-955b54775-nkvg8            0/2     Terminating         0          19m
ml-pipeline-visualizationserver-955b54775-nkvg8            0/2     Terminating         0          19m
ml-pipeline-visualizationserver-955b54775-nkvg8            0/2     Terminating         0          19m

Here is the experiments log without problems

ubuntu@vu34wtsmbwx56BootstrapVm:~$ kubectl get experiment -n test-kubeflow --watch
NAME            TYPE      STATUS   AGE
cmaes-example   Created   True     19s
cmaes-example   Running   True     22s
cmaes-example   Running   True     22s
cmaes-example   Running   True     22s
cmaes-example   Running   True     116s
cmaes-example   Running   True     116s
cmaes-example   Running   True     116s
cmaes-example   Running   True     2m1s
cmaes-example   Running   True     2m1s
cmaes-example   Succeeded   True     2m58s
cmaes-example   Succeeded   True     3m5s
cmaes-example   Succeeded   True     3m5s

I have run the test twice without problems.

misohu commented 16 hours ago

After further investigation we found out that the error was caused by running the tox env in python3.10. Switching to 3.8 resolved the issue. The tests are supposed to be executed on python3.8.