canonical / sdcore-upf-k8s-operator

Kubernetes Charm for the SD-Core User Plane Function (UPF).
https://charmhub.io/sdcore-upf-k8s
Apache License 2.0
0 stars 2 forks source link

Integration tests are flaky because of an error at the `start` event #109

Closed gruyaume closed 5 months ago

gruyaume commented 6 months ago

Describe the bug

Integration tests are flaky because of an error at the start event. Here is an example CI run:

To Reproduce

  1. Run integration tests multiple times

Expected behavior

Integration tests run reliably

Logs

INFO     juju.model:model.py:2957 Waiting for model:
  sdcore-upf-k8s/0 [idle] error: hook failed: "start"
unit-sdcore-upf-k8s-0: 2024-02-29 19:18:32 INFO juju.worker.uniter awaiting error resolution for "start" hook
unit-sdcore-upf-k8s-0: 2024-02-29 19:18:32 DEBUG juju.worker.uniter [AGENT-STATUS] error: hook failed: "start"
unit-sdcore-upf-k8s-0: 2024-02-29 19:18:32 DEBUG juju.worker.uniter.remotestate storage attachment change for sdcore-upf-k8s/0: {storage-config-0 {2 alive true /var/lib/juju/storage/config/0}}
unit-sdcore-upf-k8s-0: 2024-02-29 19:18:32 INFO juju.worker.uniter awaiting error resolution for "start" hook
unit-sdcore-upf-k8s-0: 2024-02-29 19:18:32 DEBUG juju.worker.uniter.remotestate storage attachment change for sdcore-upf-k8s/0: {storage-shared-app-1 {2 alive true /var/lib/juju/storage/shared-app/0}}
unit-sdcore-upf-k8s-0: 2024-02-29 19:18:32 INFO juju.worker.uniter awaiting error resolution for "start" hook
unit-sdcore-upf-k8s-0: 2024-02-29 19:18:32 DEBUG juju.worker.uniter.remotestate workloadEvent enqueued for sdcore-upf-k8s/0: 0
unit-sdcore-upf-k8s-0: 2024-02-29 19:18:32 DEBUG juju.worker.uniter.remotestate workloadEvent enqueued for sdcore-upf-k8s/0: 1
unit-sdcore-upf-k8s-0: 2024-02-29 19:18:32 INFO juju.worker.uniter awaiting error resolution for "start" hook
unit-sdcore-upf-k8s-0: 2024-02-29 19:18:32 INFO juju.worker.uniter awaiting error resolution for "start" hook
model-1a9263d9-23d8-4079-8a19-5d11729b3134: 2024-02-29 19:19:00 DEBUG juju.worker.caasadmission received admission request for sdcore-upf-k8s-0.17b86b939ef4126c of /v1, Kind=Event in namespace test-integration-t0k5
model-1a9263d9-23d8-4079-8a19-5d11729b3134: 2024-02-29 19:19:01 DEBUG juju.worker.caasadmission received admission request for sdcore-upf-k8s-0.17b86b939ef4126c of /v1, Kind=Event in namespace test-integration-t0k5
unit-sdcore-upf-k8s-0: 2024-02-29 19:19:01 DEBUG juju.worker.uniter.remotestate storage attachment change for sdcore-upf-k8s/0: {storage-config-0 {2 alive true /var/lib/juju/storage/config/0}}
unit-sdcore-upf-k8s-0: 2024-02-29 19:19:01 INFO juju.worker.uniter awaiting error resolution for "start" hook
unit-sdcore-upf-k8s-0: 2024-02-29 19:19:01 DEBUG juju.worker.uniter.remotestate storage attachment change for sdcore-upf-k8s/0: {storage-shared-app-1 {2 alive true /var/lib/juju/storage/shared-app/0}}
unit-sdcore-upf-k8s-0: 2024-02-29 19:19:01 INFO juju.worker.uniter awaiting error resolution for "start" hook

Environment

Gmerold commented 5 months ago

Submitted a bug in Juju project: https://bugs.launchpad.net/juju/+bug/2059105

Changing self-hosted runners to GitHub enterprise runners improved the situation significantly. Status for the Mar, 26th shows that the scheduled, nightly integration runs have been passing for the last 10 days straight.

This doesn't mean that the problem is gone, but at this point, there's nothing more that can be done of the Charmed 5G side.