canonical / grafana-agent-operator

This charmed operator automates the operational procedures of running Grafana Agent, an open-soruce telemetry collector.
https://charmhub.io/grafana-agent
Apache License 2.0
4 stars 8 forks source link

Loki alert rules contain invalid label #55

Closed cbartz closed 1 month ago

cbartz commented 5 months ago

Bug Description

Loki alert rules defined by a charm included in the grafana agent and those defined in the grafana agent contain the label juju_charm, although the logs are transmitted without this label. Furthermore, the job label is not present for the logs, so the alert rules defined in the grafana-agent are incorrect.

To Reproduce

  1. juju deploy github-runner (or any compatible machine charm)
  2. juju integrate github-runner grafana-agent

Go to the dashboard. Screenshot from 2024-02-02 09-10-15 Screenshot from 2024-02-02 09-12-15 image

Environment

reproduced on multipass and openstack cloud with grafana-agent rev 37

Relevant log output

unit-grafana-agent-44: 09:01:33 DEBUG unit.grafana-agent/44.juju-log cos-agent:98: Reading <property object at 0x7f05a0f02700> rule from /var/lib/juju/agents/unit-grafana-agent-44/charm/prometheus_alert_rules/juju_gh-runner_eb824e40_grafana-agent.rules
unit-grafana-agent-44: 09:01:33 DEBUG unit.grafana-agent/44.juju-log cos-agent:98: updated alert rules file /var/lib/juju/agents/unit-grafana-agent-44/charm/loki_alert_rules/juju_gh-runner_eb824e40_github-runner.rules
unit-grafana-agent-44: 09:01:33 DEBUG unit.grafana-agent/44.juju-log cos-agent:98: Could not locate cos-tool at: "cos-tool-amd64"
unit-grafana-agent-44: 09:01:33 DEBUG unit.grafana-agent/44.juju-log cos-agent:98: Skipping injection of juju topology as label matchers
unit-grafana-agent-44: 09:01:33 DEBUG unit.grafana-agent/44.juju-log cos-agent:98: `cos-tool` unavailable. Leaving expression unchanged: count_over_time(({filename=~"/var/log/juju/unit-.*"} |= "Failed to start the timer for reconciliation event")[1h]) > 0

unit-grafana-agent-44: 09:01:33 DEBUG unit.grafana-agent/44.juju-log cos-agent:98: Reading alert rule from /var/lib/juju/agents/unit-grafana-agent-44/charm/loki_alert_rules/juju_gh-runner_eb824e40_github-runner.rules
unit-grafana-agent-44: 09:01:33 DEBUG unit.grafana-agent/44.juju-log cos-agent:98: `cos-tool` unavailable. Leaving expression unchanged: count_over_time(({job=~".+"})[30s]) > 100

unit-grafana-agent-44: 09:01:33 DEBUG unit.grafana-agent/44.juju-log cos-agent:98: Reading alert rule from /var/lib/juju/agents/unit-grafana-agent-44/charm/loki_alert_rules/grafana_agent_high_rate.rule
unit-grafana-agent-44: 09:01:33 DEBUG unit.grafana-agent/44.juju-log cos-agent:98: `cos-tool` unavailable. Leaving expression unchanged: count_over_time({job="varlogs"} |= "error" [1h]) > 100
unit-grafana-agent-44: 09:01:33 DEBUG unit.grafana-agent/44.juju-log cos-agent:98: Reading alert rule from /var/lib/juju/agents/unit-grafana-agent-44/charm/loki_alert_rules/high_error_rate.rule
unit-grafana-agent-44: 09:01:33 DEBUG unit.grafana-agent/44.juju-log cos-agent:98: updated dashboard file /var/lib/juju/agents/unit-grafana-agent-44/charm/grafana_dashboards/juju_github_self-hosted_runner_metrics-cos-agent-github-runner-98.json
unit-grafana-agent-44: 09:01:33 DEBUG unit.grafana-agent/44.juju-log cos-agent:98: updated dashboard file /var/lib/juju/agents/unit-grafana-agent-44/charm/grafana_dashboards/juju_github_self-hosted_runner_metrics_(long-term)-cos-agent-github-runner-98.json

Additional context

No response

lucabello commented 3 months ago

Apparently, we only add the job label for scrape configs that come from the snap slots, here. We should likely add that job label here as well. This should fix the issue :)

cbartz commented 1 month ago

@lucabello Is there a timeframe for when this will be fixed? This currently prevents using loki alert rules when using the grafana agent.