canonical / kfp-operators

Kubeflow Pipelines Operators
Apache License 2.0
2 stars 12 forks source link

`kfp-api` tests are flaky when asserting response metrics #445

Open orfeas-k opened 7 months ago

orfeas-k commented 7 months ago

Bug Description

kfp-api integration tests are flaky and sometimes fail in this part, when trying to assert response metrics with the following error

  File "/home/runner/work/kfp-operators/kfp-operators/charms/kfp-api/tests/integration/test_charm.py", line 264, in test_prometheus_grafana_integration
    response_metric = response["data"]["result"][0]["metric"]
IndexError: list index out of range

This time it happened in track/2.0 but I think I 've seen it in main in the past as well.

Cause

My guess is that this is due to resources and if we increase the retry attempts, it will be resolved.

To Reproduce

Run multiple times the integration CI.

Environment

Github runners, LXD

  Name  Version        Rev    Tracking      Publisher    Notes
  lxd   4.0.9-a29c6f1  24061  4.0/stable/…  canonical**  -
  /usr/bin/sudo snap install lxd --channel=latest/stable
  snap "lxd" is already installed, see 'snap help refresh'
  /usr/bin/sudo snap refresh lxd --channel=latest/stable
  lxd 5.21.1-10f4115 from Canonical** refreshed

juju (3.1/stable) 3.1.8 from Canonical installed charmcraft (candidate) 2.6.0 from Canonical installed microk8s (1.25-strict/stable) v1.25.16 from Canonical** installed

juju status:

Model     Controller                Cloud/Region        Version  SLA          Timestamp
kubeflow  github-pr-07313-microk8s  microk8s/localhost  3.1.8    unsupported  10:30:16Z

App                           Version                  Status  Scale  Charm                         Channel         Rev  Address         Exposed  Message
grafana-k8s                   9.2.1                    active      1  grafana-k8s                   1.0/stable       93  10.152.183.211  no       
kfp-db                        mariadb/server:10.3      active      1  charmed-osm-mariadb-k8s       latest/stable    35  10.152.183.237  no       ready
kfp-viz                                                active      1  kfp-viz                       2.0/stable      985  10.152.183.226  no       
minio                         res:oci-image@1755999    active      1  minio                         ckf-1.8/stable  278  10.152.183.215  no       
mysql-k8s                     8.0.35-0ubuntu0.22.04.1  active      1  mysql-k8s                     8.0/stable      127  10.152.183.92   no       
prometheus-k8s                2.47.2                   active      1  prometheus-k8s                1.0/stable      159  10.152.183.217  no       
prometheus-scrape-config-k8s  n/a                      active      1  prometheus-scrape-config-k8s  1.0/stable       44  10.152.183.71   no       

Unit                             Workload  Agent  Address     Ports          Message
grafana-k8s/0*                   active    idle   10.1.88.26                 
kfp-db/0*                        active    idle   10.1.88.15  3306/TCP       ready
kfp-viz/0*                       active    idle   10.1.88.13                 
minio/0*                         active    idle   10.1.88.17  9000-9001/TCP  
mysql-k8s/0*                     active    idle   10.1.88.19                 Primary
prometheus-k8s/0*                active    idle   10.1.88.25                 
prometheus-scrape-config-k8s/0*  active    idle   10.1.88.22     

Relevant Log Output

tests/integration/test_charm.py::TestCharm::test_prometheus_grafana_integration 
-------------------------------- live log call ---------------------------------
INFO     juju.model:model.py:2069 Deploying ch:amd64/focal/prometheus-k8s-159
INFO     juju.model:model.py:2069 Deploying ch:amd64/focal/grafana-k8s-93
INFO     juju.model:model.py:2069 Deploying ch:amd64/focal/prometheus-scrape-config-k8s-44
WARNING  juju.model:model.py:1558 relate is deprecated and will be removed. Use integrate instead.
WARNING  juju.model:model.py:1558 relate is deprecated and will be removed. Use integrate instead.
WARNING  juju.model:model.py:1558 relate is deprecated and will be removed. Use integrate instead.
WARNING  juju.model:model.py:1558 relate is deprecated and will be removed. Use integrate instead.
INFO     juju.model:model.py:2759 Waiting for model:
  grafana-k8s/0 [allocating] waiting: installing agent
  prometheus-k8s/0 [allocating] waiting: installing agent
  prometheus-scrape-config-k8s/0 [allocating] waiting: installing agent
INFO     juju.model:model.py:2759 Waiting for model:
  grafana-k8s/0 [allocating] waiting: installing agent
  prometheus-k8s/0 [executing] maintenance: installing charm software
  prometheus-scrape-config-k8s/0 [idle] active: 
INFO     juju.model:model.py:2759 Waiting for model:
  grafana-k8s/0 [executing] maintenance: installing charm software
  prometheus-k8s/0 [executing] active: 
INFO     juju.model:model.py:2759 Waiting for model:
  grafana-k8s/0 [idle] active: 
INFO     test_charm:test_charm.py:248 Prometheus available at http://10.1.88.25:9090
INFO     test_charm:test_charm.py:251 Testing prometheus deployment (attempt 1)
INFO     test_charm:test_charm.py:261 Response status is success
INFO     test_charm:test_charm.py:251 Testing prometheus deployment (attempt 2)
INFO     test_charm:test_charm.py:261 Response status is success
INFO     test_charm:test_charm.py:251 Testing prometheus deployment (attempt 3)
INFO     test_charm:test_charm.py:261 Response status is success
INFO     test_charm:test_charm.py:251 Testing prometheus deployment (attempt 4)
INFO     test_charm:test_charm.py:261 Response status is success
INFO     test_charm:test_charm.py:251 Testing prometheus deployment (attempt 5)
INFO     test_charm:test_charm.py:261 Response status is success
FAILED
tests/integration/test_charm.py::TestCharm::test_remove_application PASSED
------------------------------ live log teardown -------------------------------
INFO     pytest_operator.plugin:plugin.py:783 Model status:

Model     Controller                Cloud/Region        Version  SLA          Timestamp
kubeflow  github-pr-07313-microk8s  microk8s/localhost  3.1.8    unsupported  10:30:16Z

App                           Version                  Status  Scale  Charm                         Channel         Rev  Address         Exposed  Message
grafana-k8s                   9.2.1                    active      1  grafana-k8s                   1.0/stable       93  10.152.183.211  no       
kfp-db                        mariadb/server:10.3      active      1  charmed-osm-mariadb-k8s       latest/stable    35  10.152.183.237  no       ready
kfp-viz                                                active      1  kfp-viz                       2.0/stable      985  10.152.183.226  no       
minio                         res:oci-image@1755999    active      1  minio                         ckf-1.8/stable  278  10.152.183.215  no       
mysql-k8s                     8.0.35-0ubuntu0.22.04.1  active      1  mysql-k8s                     8.0/stable      127  10.152.183.92   no       
prometheus-k8s                2.47.2                   active      1  prometheus-k8s                1.0/stable      159  10.152.183.217  no       
prometheus-scrape-config-k8s  n/a                      active      1  prometheus-scrape-config-k8s  1.0/stable       44  10.152.183.71   no       

Unit                             Workload  Agent  Address     Ports          Message
grafana-k8s/0*                   active    idle   10.1.88.26                 
kfp-db/0*                        active    idle   10.1.88.15  3306/TCP       ready
kfp-viz/0*                       active    idle   10.1.88.13                 
minio/0*                         active    idle   10.1.88.17  9000-9001/TCP  
mysql-k8s/0*                     active    idle   10.1.88.19                 Primary
prometheus-k8s/0*                active    idle   10.1.88.25                 
prometheus-scrape-config-k8s/0*  active    idle   10.1.88.22                 

INFO     pytest_operator.plugin:plugin.py:789 Juju error logs:

unit-kfp-api-0: 10:19:36 ERROR unit.kfp-api/0.juju-log Failed to handle <InstallEvent via KfpApiOperator/on/install[1]> with error: Please add required relation object-storage
unit-kfp-api-0: 10:19:39 ERROR unit.kfp-api/0.juju-log Failed to handle <LeaderElectedEvent via KfpApiOperator/on/leader_elected[21]> with error: List of <ops.model.Relation object-storage:1> versions not found for apps: minio
unit-kfp-api-0: 10:19:40 ERROR unit.kfp-api/0.juju-log Failed to handle <PebbleReadyEvent via KfpApiOperator/on/apiserver_pebble_ready[26]> with error: List of <ops.model.Relation object-storage:1> versions not found for apps: minio
unit-kfp-api-0: 10:19:41 ERROR unit.kfp-api/0.juju-log Failed to handle <ConfigChangedEvent via KfpApiOperator/on/config_changed[31]> with error: List of <ops.model.Relation object-storage:1> versions not found for apps: minio
unit-kfp-api-0: 10:19:47 ERROR unit.kfp-api/0.juju-log mysql:0: Failed to handle <RelationJoinedEvent via KfpApiOperator/on/mysql_relation_joined[41]> with error: List of <ops.model.Relation object-storage:1> versions not found for apps: minio
unit-kfp-api-0: 10:19:47 ERROR unit.kfp-api/0.juju-log mysql:0: Failed to handle <RelationChangedEvent via KfpApiOperator/on/mysql_relation_changed[46]> with error: List of <ops.model.Relation object-storage:1> versions not found for apps: minio
unit-kfp-api-0: 10:19:53 ERROR unit.kfp-api/0.juju-log mysql:0: Failed to handle <RelationChangedEvent via KfpApiOperator/on/mysql_relation_changed[51]> with error: List of <ops.model.Relation object-storage:1> versions not found for apps: minio
unit-kfp-api-0: 10:19:55 ERROR unit.kfp-api/0.juju-log mysql:0: Failed to handle <RelationChangedEvent via KfpApiOperator/on/mysql_relation_changed[56]> with error: List of <ops.model.Relation object-storage:1> versions not found for apps: minio
unit-kfp-viz-0: 10:19:57 ERROR unit.kfp-viz/0.juju-log kfp-viz:2: execute_components caught unhandled exception when executing configure_charm for relation:kfp-viz
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/serialised_data_interface_components.py", line 294, in get_sdi_interface
    interface = get_interface(charm, relation_name)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/serialized_data_interface/sdi.py", line 388, in get_interface
    instance.get_data()
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/serialized_data_interface/sdi.py", line 172, in get_data
    rel_data = self.unwrap(relation)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/serialized_data_interface/sdi.py", line 269, in unwrap
    version = self.get_version(relation)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/serialized_data_interface/sdi.py", line 111, in get_version
    raise errors.UnversionedRelation(relation)
serialized_data_interface.errors.UnversionedRelation: List of <ops.model.Relation kfp-viz:2> versions not found for apps: kfp-api

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/charm_reconciler.py", line 92, in reconcile
    component_item.component.configure_charm(event)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/component.py", line 51, in configure_charm
    self._configure_app(event)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/component.py", line 88, in _configure_app
    self._configure_app_leader(event)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/serialised_data_interface_components.py", line 221, in _configure_app_leader
    interface = self.get_interface()
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/serialised_data_interface_components.py", line 229, in get_interface
    return get_sdi_interface(self._charm, self._relation_name)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/serialised_data_interface_components.py", line 297, in get_sdi_interface
    raise ErrorWithStatus(str(err), WaitingStatus) from err
charmed_kubeflow_chisme.exceptions._with_status.ErrorWithStatus: List of <ops.model.Relation kfp-viz:2> versions not found for apps: kfp-api
unit-kfp-api-0: 10:19:58 ERROR unit.kfp-api/0.juju-log kfp-viz:2: Failed to handle <RelationChangedEvent via KfpApiOperator/on/kfp_viz_relation_changed[66]> with error: List of <ops.model.Relation object-storage:1> versions not found for apps: minio
unit-kfp-viz-0: 10:19:58 ERROR unit.kfp-viz/0.juju-log execute_components caught unhandled exception when executing configure_charm for relation:kfp-viz
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/serialised_data_interface_components.py", line 294, in get_sdi_interface
    interface = get_interface(charm, relation_name)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/serialized_data_interface/sdi.py", line 388, in get_interface
    instance.get_data()
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/serialized_data_interface/sdi.py", line 172, in get_data
    rel_data = self.unwrap(relation)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/serialized_data_interface/sdi.py", line 269, in unwrap
    version = self.get_version(relation)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/serialized_data_interface/sdi.py", line 111, in get_version
    raise errors.UnversionedRelation(relation)
serialized_data_interface.errors.UnversionedRelation: List of <ops.model.Relation kfp-viz:2> versions not found for apps: kfp-api

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/charm_reconciler.py", line 92, in reconcile
    component_item.component.configure_charm(event)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/component.py", line 51, in configure_charm
    self._configure_app(event)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/component.py", line 88, in _configure_app
    self._configure_app_leader(event)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/serialised_data_interface_components.py", line 221, in _configure_app_leader
    interface = self.get_interface()
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/serialised_data_interface_components.py", line 229, in get_interface
    return get_sdi_interface(self._charm, self._relation_name)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/serialised_data_interface_components.py", line 297, in get_sdi_interface
    raise ErrorWithStatus(str(err), WaitingStatus) from err
charmed_kubeflow_chisme.exceptions._with_status.ErrorWithStatus: List of <ops.model.Relation kfp-viz:2> versions not found for apps: kfp-api
unit-kfp-viz-0: 10:20:00 ERROR unit.kfp-viz/0.juju-log execute_components caught unhandled exception when executing configure_charm for relation:kfp-viz
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/serialised_data_interface_components.py", line 294, in get_sdi_interface
    interface = get_interface(charm, relation_name)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/serialized_data_interface/sdi.py", line 388, in get_interface
    instance.get_data()
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/serialized_data_interface/sdi.py", line 172, in get_data
    rel_data = self.unwrap(relation)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/serialized_data_interface/sdi.py", line 269, in unwrap
    version = self.get_version(relation)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/serialized_data_interface/sdi.py", line 111, in get_version
    raise errors.UnversionedRelation(relation)
serialized_data_interface.errors.UnversionedRelation: List of <ops.model.Relation kfp-viz:2> versions not found for apps: kfp-api

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/charm_reconciler.py", line 92, in reconcile
    component_item.component.configure_charm(event)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/component.py", line 51, in configure_charm
    self._configure_app(event)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/component.py", line 88, in _configure_app
    self._configure_app_leader(event)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/serialised_data_interface_components.py", line 221, in _configure_app_leader
    interface = self.get_interface()
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/serialised_data_interface_components.py", line 229, in get_interface
    return get_sdi_interface(self._charm, self._relation_name)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/serialised_data_interface_components.py", line 297, in get_sdi_interface
    raise ErrorWithStatus(str(err), WaitingStatus) from err
charmed_kubeflow_chisme.exceptions._with_status.ErrorWithStatus: List of <ops.model.Relation kfp-viz:2> versions not found for apps: kfp-api
unit-kfp-api-0: 10:20:02 ERROR unit.kfp-api/0.juju-log kfp-viz:2: Failed to handle <RelationChangedEvent via KfpApiOperator/on/kfp_viz_relation_changed[76]> with error: List of <ops.model.Relation object-storage:1> versions not found for apps: minio
unit-kfp-api-0: 10:20:03 ERROR unit.kfp-api/0.juju-log object-storage:1: Failed to handle <RelationChangedEvent via KfpApiOperator/on/object_storage_relation_changed[81]> with error: List of <ops.model.Relation object-storage:1> versions not found for apps: minio
unit-kfp-viz-0: 10:20:05 ERROR unit.kfp-viz/0.juju-log kfp-viz:2: execute_components caught unhandled exception when executing configure_charm for relation:kfp-viz
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/serialised_data_interface_components.py", line 294, in get_sdi_interface
    interface = get_interface(charm, relation_name)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/serialized_data_interface/sdi.py", line 388, in get_interface
    instance.get_data()
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/serialized_data_interface/sdi.py", line 172, in get_data
    rel_data = self.unwrap(relation)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/serialized_data_interface/sdi.py", line 269, in unwrap
    version = self.get_version(relation)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/serialized_data_interface/sdi.py", line 111, in get_version
    raise errors.UnversionedRelation(relation)
serialized_data_interface.errors.UnversionedRelation: List of <ops.model.Relation kfp-viz:2> versions not found for apps: kfp-api

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/charm_reconciler.py", line 92, in reconcile
    component_item.component.configure_charm(event)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/component.py", line 51, in configure_charm
    self._configure_app(event)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/component.py", line 88, in _configure_app
    self._configure_app_leader(event)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/serialised_data_interface_components.py", line 221, in _configure_app_leader
    interface = self.get_interface()
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/serialised_data_interface_components.py", line 229, in get_interface
    return get_sdi_interface(self._charm, self._relation_name)
  File "/var/lib/juju/agents/unit-kfp-viz-0/charm/venv/charmed_kubeflow_chisme/components/serialised_data_interface_components.py", line 297, in get_sdi_interface
    raise ErrorWithStatus(str(err), WaitingStatus) from err
charmed_kubeflow_chisme.exceptions._with_status.ErrorWithStatus: List of <ops.model.Relation kfp-viz:2> versions not found for apps: kfp-api
unit-kfp-api-0: 10:20:08 ERROR unit.kfp-api/0.juju-log object-storage:1: Failed to generate container configuration.
unit-kfp-api-0: 10:20:08 ERROR unit.kfp-api/0.juju-log object-storage:1: Failed to handle <RelationChangedEvent via KfpApiOperator/on/object_storage_relation_changed[86]> with error: Waiting for kfp-viz relation data
unit-kfp-api-0: 10:20:11 ERROR unit.kfp-api/0.juju-log object-storage:1: Failed to generate container configuration.
unit-kfp-api-0: 10:20:11 ERROR unit.kfp-api/0.juju-log object-storage:1: Failed to handle <RelationChangedEvent via KfpApiOperator/on/object_storage_relation_changed[91]> with error: Waiting for kfp-viz relation data
unit-kfp-api-0: 10:22:59 ERROR unit.kfp-api/0.juju-log relational-db:6: Relation mysql is deprecated. Remove deprecated mysql relation to unblock.
unit-kfp-api-0: 10:23:00 ERROR unit.kfp-api/0.juju-log relational-db:6: Relation mysql is deprecated. Remove deprecated mysql relation to unblock.
unit-kfp-api-0: 10:23:01 ERROR unit.kfp-api/0.juju-log relational-db:6: Relation mysql is deprecated. Remove deprecated mysql relation to unblock.
unit-kfp-api-0: 10:23:01 ERROR unit.kfp-api/0.juju-log relational-db:6: Relation mysql is deprecated. Remove deprecated mysql relation to unblock.
unit-kfp-api-0: 10:23:33 ERROR unit.kfp-api/0.juju-log mysql:0: Failed to generate container configuration.
unit-kfp-api-0: 10:23:33 ERROR unit.kfp-api/0.juju-log mysql:0: Failed to handle <RelationDepartedEvent via KfpApiOperator/on/mysql_relation_departed[132]> with error: Please add required database relation: eg. relational-db
unit-kfp-api-0: 10:23:34 ERROR unit.kfp-api/0.juju-log mysql:0: Failed to generate container configuration.
unit-kfp-api-0: 10:23:34 ERROR unit.kfp-api/0.juju-log mysql:0: Failed to handle <RelationBrokenEvent via KfpApiOperator/on/mysql_relation_broken[137]> with error: Please add required database relation: eg. relational-db
unit-kfp-api-0: 10:23:52 ERROR unit.kfp-api/0.juju-log relational-db:7: Failed to generate container configuration.
unit-kfp-api-0: 10:23:52 ERROR unit.kfp-api/0.juju-log relational-db:7: Failed to handle <RelationJoinedEvent via KfpApiOperator/on/relational_db_relation_joined[147]> with error: Waiting for relational-db data
unit-kfp-api-0: 10:24:17 ERROR unit.kfp-api/0.juju-log mysql:8: Relation mysql is deprecated. Remove deprecated mysql relation to unblock.
unit-kfp-api-0: 10:24:18 ERROR unit.kfp-api/0.juju-log mysql:8: Relation mysql is deprecated. Remove deprecated mysql relation to unblock.
application-kfp-db: 10:24:37 ERROR juju.worker.caasoperator.runner exited "kfp-db/0": relation scope id "60e4f333-4289-4b6f-8dd5-8bea9986a1ea:r#8#kfp-api": settings 60e4f333-4289-4b6f-8dd5-8bea9986a1ea:r#8#kfp-api not found (not found)
unit-kfp-api-0: 10:24:37 ERROR juju.worker.dependency "uniter" manifold worker returned unexpected error: relation scope id "60e4f333-4289-4b6f-8dd5-8bea9986a1ea:r#8#kfp-db": settings 60e4f333-4289-4b6f-8dd5-8bea9986a1ea:r#8#kfp-db not found (not found)
unit-kfp-api-0: 10:30:06 ERROR unit.kfp-api/0.juju-log relational-db:7: Failed to handle <RelationDepartedEvent via KfpApiOperator/on/relational_db_relation_departed[213]> with error: Please add required relation object-storage
unit-kfp-api-0: 10:30:07 ERROR unit.kfp-api/0.juju-log relational-db:7: Failed to handle <RelationBrokenEvent via KfpApiOperator/on/relational_db_relation_broken[218]> with error: Please add required relation object-storage
unit-kfp-api-0: 10:30:08 ERROR unit.kfp-api/0.juju-log Failed to handle <PebbleReadyEvent via KfpApiOperator/on/apiserver_pebble_ready[223]> with error: Please add required relation object-storage

INFO     pytest_operator.plugin:plugin.py:855 Forgetting main...
ERROR    websockets.protocol:protocol.py:881 Error in data transfer
Traceback (most recent call last):
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/websockets/protocol.py", line 827, in transfer_data
    message = await self.read_message()
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/websockets/protocol.py", line 895, in read_message
    frame = await self.read_data_frame(max_size=self.max_size)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/websockets/protocol.py", line 971, in read_data_frame
    frame = await self.read_frame(max_size)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/websockets/protocol.py", line 1047, in read_frame
    frame = await Frame.read(
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/websockets/framing.py", line 105, in read
    data = await reader(2)
  File "/usr/lib/python3.8/asyncio/streams.py", line 723, in readexactly
    await self._wait_for_data('readexactly')
  File "/usr/lib/python3.8/asyncio/streams.py", line 517, in _wait_for_data
    await self._waiter
  File "/usr/lib/python3.8/asyncio/selector_events.py", line 910, in write
    n = self._sock.send(data)
OSError: [Errno 9] Bad file descriptor

=================================== FAILURES ===================================
________________ TestCharm.test_prometheus_grafana_integration _________________
Traceback (most recent call last):
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/_pytest/runner.py", line 341, in from_call
    result: Optional[TResult] = func()
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/_pytest/runner.py", line 262, in <lambda>
    lambda: ihook(item=item, **kwds), when=when, reraise=reraise
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/pluggy/_hooks.py", line 493, in __call__
    return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/pluggy/_manager.py", line 115, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/pluggy/_callers.py", line 152, in _multicall
    return outcome.get_result()
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/pluggy/_result.py", line 114, in get_result
    raise exc.with_traceback(exc.__traceback__)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/pluggy/_callers.py", line 77, in _multicall
    res = hook_impl.function(*args)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/_pytest/runner.py", line 177, in pytest_runtest_call
    raise e
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/_pytest/runner.py", line 169, in pytest_runtest_call
    item.runtest()
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/_pytest/python.py", line 1792, in runtest
    self.ihook.pytest_pyfunc_call(pyfuncitem=self)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/pluggy/_hooks.py", line 493, in __call__
    return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/pluggy/_manager.py", line 115, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/pluggy/_callers.py", line 152, in _multicall
    return outcome.get_result()
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/pluggy/_result.py", line 114, in get_result
    raise exc.with_traceback(exc.__traceback__)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/pluggy/_callers.py", line 77, in _multicall
    res = hook_impl.function(*args)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/_pytest/python.py", line 194, in pytest_pyfunc_call
    result = testfunction(**testargs)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/pytest_asyncio/plugin.py", line 532, in inner
    _loop.run_until_complete(task)
  File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/home/runner/work/kfp-operators/kfp-operators/charms/kfp-api/tests/integration/test_charm.py", line 250, in test_prometheus_grafana_integration
    for attempt in self.retry_for_5_attempts:
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/tenacity/__init__.py", line 347, in __iter__
    do = self.iter(retry_state=retry_state)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/tenacity/__init__.py", line 325, in iter
    raise retry_exc.reraise()
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/tenacity/__init__.py", line 158, in reraise
    raise self.last_attempt.result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/home/runner/work/kfp-operators/kfp-operators/charms/kfp-api/tests/integration/test_charm.py", line 264, in test_prometheus_grafana_integration
    response_metric = response["data"]["result"][0]["metric"]
IndexError: list index out of range
------------------------------ Captured log call -------------------------------
INFO     juju.model:model.py:2069 Deploying ch:amd64/focal/prometheus-k8s-159
INFO     juju.model:model.py:2069 Deploying ch:amd64/focal/grafana-k8s-93
INFO     juju.model:model.py:2069 Deploying ch:amd64/focal/prometheus-scrape-config-k8s-44
WARNING  juju.model:model.py:1558 relate is deprecated and will be removed. Use integrate instead.
WARNING  juju.model:model.py:1558 relate is deprecated and will be removed. Use integrate instead.
WARNING  juju.model:model.py:1558 relate is deprecated and will be removed. Use integrate instead.
WARNING  juju.model:model.py:1558 relate is deprecated and will be removed. Use integrate instead.
INFO     juju.model:model.py:2759 Waiting for model:
  grafana-k8s/0 [allocating] waiting: installing agent
  prometheus-k8s/0 [allocating] waiting: installing agent
  prometheus-scrape-config-k8s/0 [allocating] waiting: installing agent
INFO     juju.model:model.py:2759 Waiting for model:
  grafana-k8s/0 [allocating] waiting: installing agent
  prometheus-k8s/0 [executing] maintenance: installing charm software
  prometheus-scrape-config-k8s/0 [idle] active: 
INFO     juju.model:model.py:2759 Waiting for model:
  grafana-k8s/0 [executing] maintenance: installing charm software
  prometheus-k8s/0 [executing] active: 
INFO     juju.model:model.py:2759 Waiting for model:
  grafana-k8s/0 [idle] active: 
INFO     test_charm:test_charm.py:248 Prometheus available at http://10.1.88.25:9090
INFO     test_charm:test_charm.py:251 Testing prometheus deployment (attempt 1)
INFO     test_charm:test_charm.py:261 Response status is success
INFO     test_charm:test_charm.py:251 Testing prometheus deployment (attempt 2)
INFO     test_charm:test_charm.py:261 Response status is success
INFO     test_charm:test_charm.py:251 Testing prometheus deployment (attempt 3)
INFO     test_charm:test_charm.py:261 Response status is success
INFO     test_charm:test_charm.py:251 Testing prometheus deployment (attempt 4)
INFO     test_charm:test_charm.py:261 Response status is success
INFO     test_charm:test_charm.py:251 Testing prometheus deployment (attempt 5)
INFO     test_charm:test_charm.py:261 Response status is success
=============================== warnings summary ===============================
tests/integration/test_charm.py: 10 warnings
  /home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/juju/model.py:1558: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
    log.warn("relate is deprecated and will be removed. Use integrate instead.")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/integration/test_charm.py::TestCharm::test_prometheus_grafana_integration - IndexError: list index out of range
============= 1 failed, 5 passed, 10 warnings in 945.42s (0:15:45) =============
Task was destroyed but it is pending!
task: <Task pending name='Task_Pinger' coro=<Connection._pinger() running at /home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/juju/client/connection.py:613> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7f73544d6e20>()]> cb=[gather.<locals>._done_callback() at /usr/lib/python3.8/asyncio/tasks.py:769, gather.<locals>._done_callback() at /usr/lib/python3.8/asyncio/tasks.py:769]>
Task was destroyed but it is pending!
task: <Task pending name='Task_Receiver' coro=<Connection._receiver() running at /home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/juju/client/connection.py:570> wait_for=<Future finished result=None> cb=[gather.<locals>._done_callback() at /usr/lib/python3.8/asyncio/tasks.py:769, gather.<locals>._done_callback() at /usr/lib/python3.8/asyncio/tasks.py:769]>
Exception ignored in: <coroutine object Connection._pinger.<locals>._do_ping at 0x7f735213edc0>
Traceback (most recent call last):
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/juju/client/connection.py", line 606, in _do_ping
    await pinger_facade.Ping()
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/juju/client/facade.py", line 486, in wrapper
    reply = await f(*args, **kwargs)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/juju/client/_client1.py", line 9012, in Ping
    reply = await self.rpc(msg)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/juju/client/facade.py", line 659, in rpc
    result = await self.connection.rpc(msg, encoder=TypeEncoder)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/juju/client/connection.py", line 667, in rpc
    result = await self._recv(msg['request-id'])
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/juju/client/connection.py", line 482, in _recv
    return await self.messages.get(request_id)
  File "/home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/juju/utils.py", line 108, in get
    value = await self._queues[id].get()
  File "/usr/lib/python3.8/asyncio/queues.py", line 165, in get
    getter.cancel()  # Just in case getter is not done yet.
  File "/usr/lib/python3.8/asyncio/base_events.py", line 719, in call_soon
    self._check_closed()
  File "/usr/lib/python3.8/asyncio/base_events.py", line 508, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
Task was destroyed but it is pending!
task: <Task pending name='Task-7299' coro=<Connection._pinger.<locals>._do_ping() done, defined at /home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/juju/client/connection.py:604> wait_for=<Future cancelled> cb=[create_task_with_handler.<locals>._task_result_exp_handler(task_name='tmp', logger=<Logger juju....ion (WARNING)>)() at /home/runner/work/kfp-operators/kfp-operators/.tox/integration/lib/python3.8/site-packages/juju/jasyncio.py:39, gather.<locals>._done_callback() at /usr/lib/python3.8/asyncio/tasks.py:769]>
Task was destroyed but it is pending!
task: <Task pending name='Task-7300' coro=<Event.wait() done, defined at /usr/lib/python3.8/asyncio/locks.py:296> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7f73544d6c40>()]> cb=[gather.<locals>._done_callback() at /usr/lib/python3.8/asyncio/tasks.py:769]>
Task was destroyed but it is pending!
task: <Task pending name='Task-7212' coro=<Event.wait() running at /usr/lib/python3.8/asyncio/locks.py:309> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7f735180e910>()]> cb=[gather.<locals>._done_callback() at /usr/lib/python3.8/asyncio/tasks.py:769, gather.<locals>._done_callback() at /usr/lib/python3.8/asyncio/tasks.py:769]>
integration: exit 1 (946.17 seconds) /home/runner/work/kfp-operators/kfp-operators/charms/kfp-api> pytest -vv --tb native --asyncio-mode=auto /home/runner/work/kfp-operators/kfp-operators/charms/kfp-api/tests/integration --log-cli-level=INFO -s --model kubeflow pid=143038
  integration: FAIL code 1 (963.19=setup[17.02]+cmd[946.17] seconds)
  evaluation failed :( (963.25 seconds)
kfp-api-integration: exit 1 (963.37 seconds) /home/runner/work/kfp-operators/kfp-operators> tox -c charms/kfp-api -e integration -- --model kubeflow pid=142342
  kfp-api-integration: FAIL code 1 (964.28=setup[0.91]+cmd[963.37] seconds)
  evaluation failed :( (964.36 seconds)

Additional Context

No response

orfeas-k commented 7 months ago

Same with https://github.com/canonical/istio-operators/issues/384

kimwnasptd commented 3 months ago

Bumped into this in a PR that didn't update any code https://github.com/canonical/kfp-operators/pull/393 https://github.com/canonical/kfp-operators/actions/runs/10366700320/job/28696539467?pr=393

@orfeas-k do we know what causes this flakiness?

orfeas-k commented 2 months ago

@kimwnasptd No, I haven't investigated this. IIRC, increasing the timeout (or the amount of retries) has resolved this in the past, so it could be caused from charms/integration not being ready yet.