canonical / observability-libs

A collection of charm libraries curated by the Observability team.
https://charmhub.io/observability-libs
Apache License 2.0
3 stars 8 forks source link

Hook failed on "certificates-relation-broken" #81

Closed IbraAoad closed 4 months ago

IbraAoad commented 5 months ago

Bug Description

When certification broken events occur, an all_certificates_invalidated event is emitted from tls_certificates_interface/v2. This event is received in cert_handler, which subsequently runs this code. However, this execution fails because the certificate no longer exists in the relation.

To Reproduce

  1. juju deploy self-signed-certificates & any cos-charm
  2. relate ca to any cos-charm over the certificates relation
  3. juju reomve relation ca:certificates alertmanager:certificates

Environment

Model     Controller  Cloud/Region        Version  SLA          Timestamp
cos-loki  k8s         microk8s/localhost  3.1.7    unsupported  16:55:37+02:00

App            Version  Status   Scale  Charm                     Channel  Rev  Address         Exposed  Message
alertmanager   0.26.0   waiting      1  alertmanager-k8s          stable   101  10.152.183.169  no       installing agent
ca                      active       1  self-signed-certificates  edge     117  10.152.183.167  no         

Unit              Workload  Agent  Address      Ports  Message
alertmanager/0*   error     idle   10.1.54.106         hook failed: "certificates-relation-broken" for ca:certificates
ca/0*             active    idle   10.1.54.66               

Integration provider                Requirer                     Interface              Type     Message
ca:certificates                     alertmanager:certificates    tls-certificates       regular  

Relevant log output

unit-ca-0: 16:51:34 INFO juju.worker.uniter.operation ran "certificates-relation-departed" hook (via hook dispatching script: dispatch)
unit-alertmanager-0: 16:51:35 INFO juju.worker.uniter.operation ran "certificates-relation-departed" hook (via hook dispatching script: dispatch)
unit-ca-0: 16:51:35 INFO juju.worker.uniter.operation ran "certificates-relation-broken" hook (via hook dispatching script: dispatch)
unit-alertmanager-0: 16:51:35 WARNING unit.alertmanager/0.juju-log certificates:28: 'app' expected but not received.
unit-alertmanager-0: 16:51:36 WARNING unit.alertmanager/0.juju-log certificates:28: 'app_name' expected in snapshot but not found.
unit-alertmanager-0: 16:51:36 INFO unit.alertmanager/0.juju-log certificates:28: Creating CSR for alertmanager-0 with DNS ['alertmanager-0.alertmanager-endpoints.cos-loki.svc.cluster.local'] and IPs []
unit-alertmanager-0: 16:51:36 ERROR unit.alertmanager/0.juju-log certificates:28: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "./src/charm.py", line 572, in <module>
    main(AlertmanagerCharm)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/venv/ops/main.py", line 456, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/venv/ops/main.py", line 144, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/venv/ops/framework.py", line 351, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/venv/ops/framework.py", line 853, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/venv/ops/framework.py", line 943, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/lib/charms/tls_certificates_interface/v2/tls_certificates.py", line 1861, in _on_relation_broken
    self.on.all_certificates_invalidated.emit()
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/venv/ops/framework.py", line 351, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/venv/ops/framework.py", line 853, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/venv/ops/framework.py", line 943, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/lib/charms/observability_libs/v0/cert_handler.py", line 420, in _on_all_certificates_invalidated
    self._generate_csr(overwrite=True, clear_cert=True)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/lib/charms/observability_libs/v0/cert_handler.py", line 272, in _generate_csr
    self.certificates.request_certificate_creation(certificate_signing_request=csr)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/lib/charms/tls_certificates_interface/v2/tls_certificates.py", line 1614, in request_certificate_creation
    raise RuntimeError(
RuntimeError: Relation certificates does not exist - The certificate request can't be completed

Additional context

No response

dstathis commented 5 months ago

Just hit this with traefik. +1

nsklikas commented 4 months ago

Not sure if this is the same error, but I am consistently getting this error when the relation is broken:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-glauth-k8s-1/charm/./src/charm.py", line 297, in <module>
    main(GLAuthCharm)
  File "/var/lib/juju/agents/unit-glauth-k8s-1/charm/venv/ops/main.py", line 544, in main
    manager.run()
  File "/var/lib/juju/agents/unit-glauth-k8s-1/charm/venv/ops/main.py", line 520, in run
    self._emit()
  File "/var/lib/juju/agents/unit-glauth-k8s-1/charm/venv/ops/main.py", line 509, in _emit
    _emit_charm_event(self.charm, self.dispatcher.event_name)
  File "/var/lib/juju/agents/unit-glauth-k8s-1/charm/venv/ops/main.py", line 143, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-glauth-k8s-1/charm/venv/ops/framework.py", line 352, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-glauth-k8s-1/charm/venv/ops/framework.py", line 851, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-glauth-k8s-1/charm/venv/ops/framework.py", line 941, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-glauth-k8s-1/charm/lib/charms/tls_certificates_interface/v3/tls_certificates.py", line 1840, in _on_relation_broken
    self.on.all_certificates_invalidated.emit()
  File "/var/lib/juju/agents/unit-glauth-k8s-1/charm/venv/ops/framework.py", line 352, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-glauth-k8s-1/charm/venv/ops/framework.py", line 851, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-glauth-k8s-1/charm/venv/ops/framework.py", line 941, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-glauth-k8s-1/charm/lib/charms/observability_libs/v1/cert_handler.py", line 395, in _on_all_certificates_invalidated
    self._generate_csr(overwrite=True, clear_cert=True)
  File "/var/lib/juju/agents/unit-glauth-k8s-1/charm/lib/charms/observability_libs/v1/cert_handler.py", line 230, in _generate_csr
    raise RuntimeError(
RuntimeError: private key unset. call _generate_privkey() before you call this method.
unit-glauth-k8s-1: 11:42:22 ERROR juju.worker.uniter.operation hook "certificates-relation-broken" (via hook dispatching script: dispatch) failed: exit status 1

This happens every time I scale my application's units down. I use the latest versions of cert_handler=v1 and tls_certificates=v3. This error does not happen on juju v3.2, but it always happens on v3.3 and v3.4. You should be able to replicate it by running the integration tests from this commit.

przemeklal commented 4 months ago

I just hit the same issue on juju 3.4:

unit-alertmanager-0: 13:38:11 ERROR unit.alertmanager/0.juju-log certificates:30: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "./src/charm.py", line 597, in <module>
    main(AlertmanagerCharm)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/venv/ops/main.py", line 456, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/venv/ops/main.py", line 144, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/venv/ops/framework.py", line 352, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/venv/ops/framework.py", line 865, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/venv/ops/framework.py", line 955, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/lib/charms/tls_certificates_interface/v2/tls_certificates.py", line 1863, in _on_relation_broken
    self.on.all_certificates_invalidated.emit()
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/venv/ops/framework.py", line 352, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/venv/ops/framework.py", line 865, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/venv/ops/framework.py", line 955, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 532, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/lib/charms/observability_libs/v1/cert_handler.py", line 396, in _on_all_certificates_invalidated
    self._generate_csr(overwrite=True, clear_cert=True)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 532, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/lib/charms/observability_libs/v1/cert_handler.py", line 231, in _generate_csr
    raise RuntimeError(
RuntimeError: private key unset. call _generate_privkey() before you call this method.
unit-alertmanager-0: 13:38:12 ERROR juju.worker.uniter.operation hook "certificates-relation-broken" (via hook dispatching script: dispatch) failed: exit status 1

Is there any workaround? It blocks the migration form self-signed-certificates to the new tls-* options introduced in traefik.