canonical / kfp-operators

Kubeflow Pipelines Operators
Apache License 2.0
2 stars 12 forks source link

Getting KeyError when checking viz_server_healthcheck #561

Open kimwnasptd opened 1 month ago

kimwnasptd commented 1 month ago

Bug Description

The CI failed for a KFP repo with the errors on the viz_server_healthcheck. It fails to find an address. Saw this in https://github.com/canonical/kfp-operators/pull/393 https://github.com/canonical/kfp-operators/actions/runs/10366700320/job/28696539172?pr=393

This could be because the charm is not yet up?

To Reproduce

The above happens sometimes in the CI but haven't tried locally

Environment

latest/edge branch

Relevant Log Output

_________________________ test_viz_server_healthcheck __________________________
Traceback (most recent call last):
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/bundle-integration-v2/lib/python3.8/site-packages/_pytest/runner.py", line 341, in from_call
    result: TResult | None = func()
...
...
  File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/home/runner/work/kfp-operators/kfp-operators/tests/integration/test_kfp_functional_v2.py", line 244, in test_viz_server_healthcheck
    url = kfp_viz_unit["address"]
KeyError: 'address'
...
Model     Controller                Cloud/Region        Version  SLA          Timestamp
kubeflow  github-pr-384f0-microk8s  microk8s/localhost  3.4.5    unsupported  11:01:35Z

App                      Version  Status   Scale  Charm                    Channel       Rev  Address         Exposed  Message
argo-controller                   waiting      1  argo-controller          latest/edge   549  10.152.183.219  no       installing agent
envoy                             waiting      1  envoy                    latest/edge   277  10.152.183.237  no       installing agent
istio-ingressgateway              waiting      1  istio-gateway            latest/edge  1148  10.152.183.207  no       installing agent
istio-pilot                       waiting    0/1  istio-pilot              latest/edge  1100                  no       installing agent
kfp-api                           waiting    0/1  kfp-api                                  0                  no       installing agent
kfp-db                            waiting    0/1  mysql-k8s                8.0/stable    153                  no       installing agent
kfp-metadata-writer               waiting    0/1  kfp-metadata-writer                      0                  no       installing agent
kfp-persistence                   waiting    0/1  kfp-persistence                          0                  no       installing agent
kfp-profile-controller            waiting    0/1  kfp-profile-controller                   0                  no       installing agent
kfp-schedwf                       waiting    0/1  kfp-schedwf                              0                  no       installing agent
kfp-ui                            waiting    0/1  kfp-ui                                   0                  no       installing agent
kfp-viewer                        waiting    0/1  kfp-viewer                               0                  no       installing agent
kfp-viz                           waiting    0/1  kfp-viz                                  0                  no       installing agent
kubeflow-profiles                 waiting    0/1  kubeflow-profiles        latest/edge   425                  no       installing agent
kubeflow-roles                    waiting    0/1  kubeflow-roles           latest/edge   248                  no       installing agent
metacontroller-operator           waiting    0/1  metacontroller-operator  latest/edge   318                  no       installing agent
minio                             waiting    0/1  minio                    latest/edge   357                  no       installing agent
mlmd                              waiting    0/1  mlmd                     latest/edge   219                  no       installing agent

Unit                       Workload  Agent       Address       Ports  Message
argo-controller/0*         blocked   idle        10.1.164.138         [relation:object_storage] Expected data from exactly 1 related applications - got 0.
envoy/0*                   blocked   executing   10.1.164.139         [grpc] Missing relation with a k8s service info provider. Please add the missing relation.
istio-ingressgateway/0*    blocked   executing   10.1.164.140         Please add required relation to istio-pilot
istio-pilot/0*             waiting   allocating                       agent initialising
kfp-api/0                  waiting   allocating                       installing agent
kfp-db/0                   waiting   allocating                       installing agent
kfp-metadata-writer/0      waiting   allocating                       installing agent
kfp-persistence/0          waiting   allocating                       installing agent
kfp-profile-controller/0   waiting   allocating                       installing agent
kfp-schedwf/0              waiting   allocating                       installing agent
kfp-ui/0                   waiting   allocating                       installing agent
kfp-viewer/0               waiting   allocating                       installing agent
kfp-viz/0                  waiting   allocating                       installing agent
kubeflow-profiles/0        waiting   allocating                       installing agent
kubeflow-roles/0           waiting   allocating                       installing agent
metacontroller-operator/0  waiting   allocating                       installing agent
minio/0                    waiting   allocating                       installing agent
mlmd/0                     waiting   allocating                       installing agent

Additional Context

No response

syncronize-issues-to-jira[bot] commented 1 month ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-6134.

This message was autogenerated

misohu commented 1 month ago

Hey @kimwnasptd based on the provided logs the tests failed already on test_upload_pipeline.

After rerun the tests passed. We think this may be related to intermittent runner problem.