canonical / alertmanager-k8s-operator

https://charmhub.io/alertmanager-k8s
Apache License 2.0
5 stars 17 forks source link

Alertmanger fails hook "config changed" due to assert private_key is not None #208

Closed amc94 closed 8 months ago

amc94 commented 11 months ago

Bug Description

In test run: https://solutions.qa.canonical.com/testruns/f5482205-0dd8-4ab8-9da7-32485bdb5158, deployment fails because alertmanager/1 is in error state:

App           Version  Status   Scale  Charm                     Channel  Rev  Address         Exposed  Message
alertmanager  0.25.0   waiting      2  alertmanager-k8s          stable    96  10.152.183.166  no       waiting for units to settle down
avalanche              active       2  avalanche-k8s             edge      36  10.152.183.172  no       
ca                     active       1  self-signed-certificates  edge      52  10.152.183.249  no       
catalogue              active       1  catalogue-k8s             stable    31  10.152.183.198  no       
external-ca            active       1  self-signed-certificates  edge      52  10.152.183.47   no       
grafana       9.2.1    active       1  grafana-k8s               stable    93  10.152.183.196  no       
loki          2.7.4    active       1  loki-k8s                  stable   105  10.152.183.89   no       
prometheus    2.47.2   active       1  prometheus-k8s            stable   156  10.152.183.39   no       
traefik       2.10.4   active       1  traefik-k8s               stable   166  10.246.167.226  no       

Unit             Workload  Agent  Address      Ports  Message
alertmanager/0*  active    idle   10.1.250.76         
alertmanager/1   error     idle   10.1.198.73         hook failed: "config-changed"
avalanche/0*     active    idle   10.1.250.73         
avalanche/1      active    idle   10.1.198.70         
ca/0*            active    idle   10.1.250.74         
catalogue/0*     active    idle   10.1.27.6           
external-ca/0*   active    idle   10.1.27.5           
grafana/0*       active    idle   10.1.198.74         
loki/0*          active    idle   10.1.27.7           
prometheus/0*    active    idle   10.1.250.79         
traefik/0*       active    idle   10.1.250.78

To Reproduce

1.Deploy cos-lite bundle

Environment

charm is latest/stable, rev 96

Relevant log output

unit-alertmanager-1: 2023-12-14 23:04:05 DEBUG jujuc running hook tool "juju-log" for alertmanager/1-config-changed-8622918567248683281
unit-alertmanager-1: 2023-12-14 23:04:05 ERROR unit.alertmanager/1.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "./src/charm.py", line 561, in <module>
    main(AlertmanagerCharm)
  File "/var/lib/juju/agents/unit-alertmanager-1/charm/venv/ops/main.py", line 441, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-alertmanager-1/charm/venv/ops/main.py", line 149, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-alertmanager-1/charm/venv/ops/framework.py", line 344, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-alertmanager-1/charm/venv/ops/framework.py", line 833, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-alertmanager-1/charm/venv/ops/framework.py", line 922, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-alertmanager-1/charm/lib/charms/observability_libs/v0/cert_handler.py", line 220, in _on_config_changed
    self._generate_csr(renew=True)
  File "/var/lib/juju/agents/unit-alertmanager-1/charm/lib/charms/observability_libs/v0/cert_handler.py", line 240, in _generate_csr
    assert private_key is not None  # for type checker
AssertionError


### Additional context

More logs, including crashdumps, can be found under: https://oil-jenkins.canonical.com/artifacts/f5482205-0dd8-4ab8-9da7-32485bdb5158/index.html
lucabello commented 9 months ago

We should update cert handler to use v1; if it doesn't fix it, we should investigate it more :)

lucabello commented 9 months ago

I can only reproduce the issue when removing the application; I'm still upgrading cert handler to use v1, as it solves the issue in that situation.