canonical / grafana-agent-operator

This charmed operator automates the operational procedures of running Grafana Agent, an open-soruce telemetry collector.
https://charmhub.io/grafana-agent
Apache License 2.0
4 stars 8 forks source link

Stuck waiting for TLS certificate #80

Open samuelallan72 opened 3 months ago

samuelallan72 commented 3 months ago

Bug Description

Sometimes grafana-agent units (not necessarily all of them at once) get stuck in waiting status with message Waiting for TLS certificate.. This is related to the relation with grafana, and resolves if I bounce the relation between grafana and grafana-agent.

To Reproduce

This was observed in a test environment while testing something else; I don't have a neat reproducer yet. Opening the issue in case others have also observed it, and will fill in more info if I have time to investigate further. :)

Environment

Not complete environment, but this may help:

$ juju status grafana-agent
Model      Controller   Cloud/Region             Version  SLA          Timestamp
teststack  serverstack  serverstack/serverstack  3.4.0    unsupported  14:28:18+10:30

SAAS        Status  Store               URL
grafana     active  overcloud-microk8s  admin/cos.grafana
loki        active  overcloud-microk8s  admin/cos.loki
prometheus  active  overcloud-microk8s  admin/cos.prometheus

App                  Version  Status   Scale  Charm                 Channel       Rev  Exposed  Message
cinder               20.3.1   active       1  cinder                yoga/stable   664  no       Unit is ready
cinder-mysql-router  8.0.36   active       0  mysql-router          8.0/stable    137  no       Unit is ready
glance               24.2.1   active       1  glance                yoga/stable   594  no       Unit is ready
glance-mysql-router  8.0.36   active       0  mysql-router          8.0/stable    137  no       Unit is ready
grafana-agent                 waiting      7  grafana-agent         edge           77  no       Waiting for TLS certificate.
mysql                8.0.36   active       3  mysql-innodb-cluster  8.0/stable    107  no       Unit is ready: Mode: R/O, Cluster is ONLINE and can tolerate up to ONE failure.
nova-compute         25.2.1   active       1  nova-compute          yoga/stable   722  no       Unit is ready
openstack-exporter            active       1  openstack-exporter                   32  no
ovn-chassis          22.03.3  active       0  ovn-chassis           22.03/stable  222  no       Unit is ready

Unit                      Workload  Agent  Machine  Public address  Ports     Message
cinder/0*                 active    idle   0        10.5.0.223      8776/tcp  Unit is ready
  cinder-mysql-router/0*  active    idle            10.5.0.223                Unit is ready
  grafana-agent/16*       waiting   idle            10.5.0.223                Waiting for TLS certificate.
glance/0*                 active    idle   1        10.5.1.75       9292/tcp  Unit is ready
  glance-mysql-router/0*  active    idle            10.5.1.75                 Unit is ready
  grafana-agent/17        active    idle            10.5.1.75
mysql/0                   active    idle   3        10.5.0.8                  Unit is ready: Mode: R/O, Cluster is ONLINE and can tolerate up to ONE failure.
  grafana-agent/18        active    idle            10.5.0.8
mysql/1*                  active    idle   4        10.5.1.197                Unit is ready: Mode: R/W, Cluster is ONLINE and can tolerate up to ONE failure.
  grafana-agent/20        active    idle            10.5.1.197
mysql/2                   active    idle   5        10.5.2.227                Unit is ready: Mode: R/O, Cluster is ONLINE and can tolerate up to ONE failure.
  grafana-agent/19        active    idle            10.5.2.227
nova-compute/0*           active    idle   8        10.5.0.28                 Unit is ready
  grafana-agent/22        active    idle            10.5.0.28
  ovn-chassis/0*          active    idle            10.5.0.28                 Unit is ready
openstack-exporter/1*     active    idle   18       10.5.3.70
  grafana-agent/21        active    idle            10.5.3.70

Machine  State    Address     Inst id                               Base          AZ    Message
0        started  10.5.0.223  a2651ff9-a4e7-4b00-a303-01559d902f74  ubuntu@22.04  nova  ACTIVE
1        started  10.5.1.75   4dc9a492-c20e-49a8-a779-48283e9d7e57  ubuntu@22.04  nova  ACTIVE
3        started  10.5.0.8    5f641db1-3a58-4495-932c-6d11636c9f14  ubuntu@22.04  nova  ACTIVE
4        started  10.5.1.197  591c673f-f469-4c79-945c-7f22a929c2e1  ubuntu@22.04  nova  ACTIVE
5        started  10.5.2.227  900a92e2-b8dc-4159-bed7-cd3abc00167f  ubuntu@22.04  nova  ACTIVE
8        started  10.5.0.28   7e044527-5d11-44f0-bc19-bb597b54d4b5  ubuntu@22.04  nova  ACTIVE
18       started  10.5.3.70   0fcc5a0f-9a14-45a5-839c-fa31fad19d75  ubuntu@22.04  nova  ACTIVE

Relevant log output

.

Additional context

No response

sed-i commented 3 months ago

Future ref: https://github.com/canonical/grafana-agent-operator/blob/16ae09be9d810bd7f2ba0441edcdc079d11a4418/src/grafana_agent.py#L523