canonical / grafana-agent-operator

https://charmhub.io/grafana-agent
Apache License 2.0
4 stars 11 forks source link

Charmed PostgreSQL alert rules failed to upload via grafana-agent from edge (revision 299), works well with stable (revision 216) #198

Closed taurus-forever closed 3 days ago

taurus-forever commented 3 weeks ago

Bug Description

Hi,

Alert rules cannot be uploaded to COS if grafana-agent used from edge:

Also tested other combinations, all alert rules works well for:

The error message from COS logs:

unit-prometheus-0: 22:32:40 ERROR unit.prometheus/0.juju-log receive-remote-write:29: Invalid alert rule file: error validating /tmp/tmpnuk9rkg9/validate_rule.yaml: [group "test18_015dad2a_grafana-agent_HostHealth_alerts", rule 1, "HostDown": annotation "summary": template: __alert_HostDown:1: function "labels" not defined group "test18_015dad2a_grafana-agent_HostHealth_alerts", rule 2, "HostUnavailable": annotation "summary": template: __alert_HostUnavailable:1: function "labels" not defined]

The PostgreSQL charm alert rules are here: https://github.com/canonical/postgresql-operator/tree/main/src/prometheus_alert_rules I cannot explain why stable grafana-agent uploads them weel, but the edge one throwing an error on Prometheus level.

To Reproduce

juju switch microk8s && juju add-model cos
juju deploy cos-lite --trust --channel edge
juju offer grafana:grafana-dashboard grafana
juju offer loki:logging loki
juju offer prometheus:receive-remote-write prometheus

juju switch lxd && juju add-model postgresql
juju deploy postgresql
juju deploy grafana-agent --channel edge # use stable channel to test the working case
juju relate postgresql grafana-agent
juju consume microk8s:admin/cos2.loki
juju consume microk8s:admin/cos2.grafana
juju consume microk8s:admin/cos2.prometheus
juju integrate grafana-agent grafana
juju integrate grafana-agent loki
juju integrate grafana-agent prometheus

Environment

juju              3.5.4        28520  3.5/stable          canonical✓     -
lxd               6.1-78a3d8f  30130  latest/stable       canonical✓     -
microk8s          v1.28.14     7228   1.28-strict/stable  canonical✓     -

postgresql charm revision 468 from 14/stable
grafana-agent revsion 216 works well
grafana-agent revsion 299 doesn't works

COS-lite from latestl/edge:
App           Version  Status  Scale  Charm             Channel      Rev  Address         Exposed  Message
alertmanager  0.27.0   active      1  alertmanager-k8s  latest/edge  138  10.152.183.19   no       
catalogue              active      1  catalogue-k8s     latest/edge   68  10.152.183.162  no       
grafana       9.5.3    active      1  grafana-k8s       latest/edge  121  10.152.183.22   no       
loki          2.9.6    active      1  loki-k8s          latest/edge  174  10.152.183.216  no       
prometheus    2.52.0   active      1  prometheus-k8s    latest/edge  214  10.152.183.215  no       
traefik       2.11.0   active      1  traefik-k8s       latest/edge  213  10.152.183.53   no       Serving at 10.76.203.225

Relevant log output

unit-grafana-0: 22:32:35 INFO unit.grafana/0.juju-log grafana-source:12: Restarted grafana-k8s
unit-grafana-0: 22:32:35 INFO juju.worker.uniter.operation ran "grafana-source-relation-changed" hook (via hook dispatching script: dispatch)
unit-grafana-0: 22:32:37 WARNING unit.grafana/0.juju-log grafana-dashboard:27: Provided Redirect URL uses http scheme. Don't do this in production
unit-grafana-0: 22:32:38 INFO unit.grafana/0.juju-log grafana-dashboard:27: HTTP Request: GET https://10.152.183.1/apis/apps/v1/namespaces/cos2/statefulsets/grafana "HTTP/1.1 200 OK"
unit-grafana-0: 22:32:38 INFO unit.grafana/0.juju-log grafana-dashboard:27: HTTP Request: GET https://10.152.183.1/api/v1/namespaces/cos2/pods/grafana-0 "HTTP/1.1 200 OK"
unit-grafana-0: 22:32:38 INFO unit.grafana/0.juju-log grafana-dashboard:27: reqs=ResourceRequirements(claims=None, limits={}, requests={'cpu': '0.25', 'memory': '200Mi'}), templated=ResourceRequirements(claims=None, limits=None, requests={'cpu': '250m', 'memory': '200Mi'}), actual=ResourceRequirements(claims=None, limits=None, requests={'cpu': '250m', 'memory': '200Mi'})
unit-grafana-0: 22:32:38 INFO unit.grafana/0.juju-log grafana-dashboard:27: HTTP Request: GET https://10.152.183.1/apis/apps/v1/namespaces/cos2/statefulsets/grafana "HTTP/1.1 200 OK"
unit-grafana-0: 22:32:38 INFO unit.grafana/0.juju-log grafana-dashboard:27: HTTP Request: GET https://10.152.183.1/api/v1/namespaces/cos2/pods/grafana-0 "HTTP/1.1 200 OK"
unit-prometheus-0: 22:32:38 WARNING unit.prometheus/0.receive-remote-write-relation-changed rehash: warning: skipping ca-certificates.crt,it does not contain exactly one certificate or CRL
unit-prometheus-0: 22:32:38 WARNING unit.prometheus/0.juju-log receive-remote-write:29: <class '__main__.PrometheusCharm'>.server_ca_cert_path is None; sending traces over INSECURE connection.
unit-grafana-0: 22:32:39 INFO unit.grafana/0.juju-log grafana-dashboard:27: Restarted grafana-k8s
unit-grafana-0: 22:32:39 INFO unit.grafana/0.juju-log grafana-dashboard:27: Initializing dashboard provisioning path
unit-prometheus-0: 22:32:40 ERROR unit.prometheus/0.juju-log receive-remote-write:29: Invalid alert rule file: error validating /tmp/tmpnuk9rkg9/validate_rule.yaml: [group "test18_015dad2a_grafana-agent_HostHealth_alerts", rule 1, "HostDown": annotation "summary": template: __alert_HostDown:1: function "labels" not defined group "test18_015dad2a_grafana-agent_HostHealth_alerts", rule 2, "HostUnavailable": annotation "summary": template: __alert_HostUnavailable:1: function "labels" not defined]
unit-prometheus-0: 22:32:40 INFO unit.prometheus/0.juju-log receive-remote-write:29: Prometheus (re)started
unit-grafana-0: 22:32:40 INFO unit.grafana/0.juju-log grafana-dashboard:27: Restarted grafana-k8s
unit-prometheus-0: 22:32:40 INFO juju.worker.uniter.operation ran "receive-remote-write-relation-changed" hook (via hook dispatching script: dispatch)
unit-grafana-0: 22:32:41 INFO juju.worker.uniter.operation ran "grafana-dashboard-relation-changed" hook (via hook dispatching script: dispatch)
unit-grafana-0: 22:32:42 INFO juju.worker.uniter.operation ran "grafana-relation-changed" hook (via hook dispatching script: dispatch)
unit-grafana-0: 22:32:44 INFO juju.worker.uniter.operation ran "grafana-dashboard-relation-changed" hook (via hook dispatching script: dispatch)
unit-grafana-0: 22:32:58 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-traefik-0: 22:33:14 INFO unit.traefik/0.juju-log Kubernetes service 'traefik' patched successfully

Additional context

No response