canonical / istio-beacon-k8s-operator

https://charmhub.io/istio-beacon-k8s
Apache License 2.0
0 stars 0 forks source link

Set automatically retry hooks to true for istio charms' CI #8

Open IbraAoad opened 2 months ago

IbraAoad commented 2 months ago

Bug Description

Charms have an open connection with the juju controller pod, whenever we add charms on the mesh this connection gets reset as istio reconfigures the networking of these charms in runtime, This cause these charms to go into error state until the hook is retried and they regain the connection.

in CI hook retries are disabled by default, we need to enable them with smth like this

To Reproduce

Run this repo's itests on a bootstrapped juju controller that has automatically-retry-hooks=false

Environment

latest/edge

Relevant log output

ops.model.ModelError: ERROR codec.ReadHeader error: error receiving message: read tcp 10.1.25.78:41684->10.152.183.252:17070: read: connection reset by peer

model-f6c84618-17ab-475c-8648-af5f7821f850: 08:23:31 ERROR juju.worker.dependency "log-sender" manifold worker returned unexpected error: sending log message: read tcp 10.1.25.77:46646->10.152.183.252:17070: read: connection reset by peer: set tcp 10.1.25.77:46646: use of closed network connection
unit-istio-beacon-k8s-0: 08:23:31 ERROR juju.worker.uniter.operation hook "config-changed" (via hook dispatching script: dispatch) failed: exit status 1
model-f6c84618-17ab-475c-8648-af5f7821f850: 08:23:38 ERROR juju.worker.dependency "api-caller" manifold worker returned unexpected error: codec.ReadHeader error: error receiving message: read tcp 10.1.25.77:46644->10.152.183.252:17070: read: connection reset by peer

Additional context

No response

ca-scribner commented 2 months ago

To confirm, are we saying that when ApplicationA is added to the service mesh then ApplicationA's existing connection with the controller is severed, but that other applications are unaffected?

ca-scribner commented 2 months ago

Yes, confirmed offline that is correct. And the Application in question goes into an error state with agent lost. The model operator also has logs stating the same thing

dstathis commented 2 months ago

For now lets consider this issue done when retries are enabled. We should, however, make sure to discuss with the Juju team the possibility of allowing a short window before going in to error state.

dstathis commented 2 months ago

As part of this issue we also need to fix CI so that we are pointing at the main branch of the observability repo.