Closed taurus-forever closed 3 days ago
Hi,
Alert rules cannot be uploaded to COS if grafana-agent used from edge:
Also tested other combinations, all alert rules works well for:
The error message from COS logs:
unit-prometheus-0: 22:32:40 ERROR unit.prometheus/0.juju-log receive-remote-write:29: Invalid alert rule file: error validating /tmp/tmpnuk9rkg9/validate_rule.yaml: [group "test18_015dad2a_grafana-agent_HostHealth_alerts", rule 1, "HostDown": annotation "summary": template: __alert_HostDown:1: function "labels" not defined group "test18_015dad2a_grafana-agent_HostHealth_alerts", rule 2, "HostUnavailable": annotation "summary": template: __alert_HostUnavailable:1: function "labels" not defined]
The PostgreSQL charm alert rules are here: https://github.com/canonical/postgresql-operator/tree/main/src/prometheus_alert_rules I cannot explain why stable grafana-agent uploads them weel, but the edge one throwing an error on Prometheus level.
juju switch microk8s && juju add-model cos juju deploy cos-lite --trust --channel edge juju offer grafana:grafana-dashboard grafana juju offer loki:logging loki juju offer prometheus:receive-remote-write prometheus juju switch lxd && juju add-model postgresql juju deploy postgresql juju deploy grafana-agent --channel edge # use stable channel to test the working case juju relate postgresql grafana-agent juju consume microk8s:admin/cos2.loki juju consume microk8s:admin/cos2.grafana juju consume microk8s:admin/cos2.prometheus juju integrate grafana-agent grafana juju integrate grafana-agent loki juju integrate grafana-agent prometheus
juju 3.5.4 28520 3.5/stable canonical✓ - lxd 6.1-78a3d8f 30130 latest/stable canonical✓ - microk8s v1.28.14 7228 1.28-strict/stable canonical✓ - postgresql charm revision 468 from 14/stable grafana-agent revsion 216 works well grafana-agent revsion 299 doesn't works COS-lite from latestl/edge: App Version Status Scale Charm Channel Rev Address Exposed Message alertmanager 0.27.0 active 1 alertmanager-k8s latest/edge 138 10.152.183.19 no catalogue active 1 catalogue-k8s latest/edge 68 10.152.183.162 no grafana 9.5.3 active 1 grafana-k8s latest/edge 121 10.152.183.22 no loki 2.9.6 active 1 loki-k8s latest/edge 174 10.152.183.216 no prometheus 2.52.0 active 1 prometheus-k8s latest/edge 214 10.152.183.215 no traefik 2.11.0 active 1 traefik-k8s latest/edge 213 10.152.183.53 no Serving at 10.76.203.225
unit-grafana-0: 22:32:35 INFO unit.grafana/0.juju-log grafana-source:12: Restarted grafana-k8s unit-grafana-0: 22:32:35 INFO juju.worker.uniter.operation ran "grafana-source-relation-changed" hook (via hook dispatching script: dispatch) unit-grafana-0: 22:32:37 WARNING unit.grafana/0.juju-log grafana-dashboard:27: Provided Redirect URL uses http scheme. Don't do this in production unit-grafana-0: 22:32:38 INFO unit.grafana/0.juju-log grafana-dashboard:27: HTTP Request: GET https://10.152.183.1/apis/apps/v1/namespaces/cos2/statefulsets/grafana "HTTP/1.1 200 OK" unit-grafana-0: 22:32:38 INFO unit.grafana/0.juju-log grafana-dashboard:27: HTTP Request: GET https://10.152.183.1/api/v1/namespaces/cos2/pods/grafana-0 "HTTP/1.1 200 OK" unit-grafana-0: 22:32:38 INFO unit.grafana/0.juju-log grafana-dashboard:27: reqs=ResourceRequirements(claims=None, limits={}, requests={'cpu': '0.25', 'memory': '200Mi'}), templated=ResourceRequirements(claims=None, limits=None, requests={'cpu': '250m', 'memory': '200Mi'}), actual=ResourceRequirements(claims=None, limits=None, requests={'cpu': '250m', 'memory': '200Mi'}) unit-grafana-0: 22:32:38 INFO unit.grafana/0.juju-log grafana-dashboard:27: HTTP Request: GET https://10.152.183.1/apis/apps/v1/namespaces/cos2/statefulsets/grafana "HTTP/1.1 200 OK" unit-grafana-0: 22:32:38 INFO unit.grafana/0.juju-log grafana-dashboard:27: HTTP Request: GET https://10.152.183.1/api/v1/namespaces/cos2/pods/grafana-0 "HTTP/1.1 200 OK" unit-prometheus-0: 22:32:38 WARNING unit.prometheus/0.receive-remote-write-relation-changed rehash: warning: skipping ca-certificates.crt,it does not contain exactly one certificate or CRL unit-prometheus-0: 22:32:38 WARNING unit.prometheus/0.juju-log receive-remote-write:29: <class '__main__.PrometheusCharm'>.server_ca_cert_path is None; sending traces over INSECURE connection. unit-grafana-0: 22:32:39 INFO unit.grafana/0.juju-log grafana-dashboard:27: Restarted grafana-k8s unit-grafana-0: 22:32:39 INFO unit.grafana/0.juju-log grafana-dashboard:27: Initializing dashboard provisioning path unit-prometheus-0: 22:32:40 ERROR unit.prometheus/0.juju-log receive-remote-write:29: Invalid alert rule file: error validating /tmp/tmpnuk9rkg9/validate_rule.yaml: [group "test18_015dad2a_grafana-agent_HostHealth_alerts", rule 1, "HostDown": annotation "summary": template: __alert_HostDown:1: function "labels" not defined group "test18_015dad2a_grafana-agent_HostHealth_alerts", rule 2, "HostUnavailable": annotation "summary": template: __alert_HostUnavailable:1: function "labels" not defined] unit-prometheus-0: 22:32:40 INFO unit.prometheus/0.juju-log receive-remote-write:29: Prometheus (re)started unit-grafana-0: 22:32:40 INFO unit.grafana/0.juju-log grafana-dashboard:27: Restarted grafana-k8s unit-prometheus-0: 22:32:40 INFO juju.worker.uniter.operation ran "receive-remote-write-relation-changed" hook (via hook dispatching script: dispatch) unit-grafana-0: 22:32:41 INFO juju.worker.uniter.operation ran "grafana-dashboard-relation-changed" hook (via hook dispatching script: dispatch) unit-grafana-0: 22:32:42 INFO juju.worker.uniter.operation ran "grafana-relation-changed" hook (via hook dispatching script: dispatch) unit-grafana-0: 22:32:44 INFO juju.worker.uniter.operation ran "grafana-dashboard-relation-changed" hook (via hook dispatching script: dispatch) unit-grafana-0: 22:32:58 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch) unit-traefik-0: 22:33:14 INFO unit.traefik/0.juju-log Kubernetes service 'traefik' patched successfully
No response
Bug Description
Hi,
Alert rules cannot be uploaded to COS if grafana-agent used from edge:
Also tested other combinations, all alert rules works well for:
The error message from COS logs:
The PostgreSQL charm alert rules are here: https://github.com/canonical/postgresql-operator/tree/main/src/prometheus_alert_rules I cannot explain why stable grafana-agent uploads them weel, but the edge one throwing an error on Prometheus level.
To Reproduce
Environment
Relevant log output
Additional context
No response