canonical / grafana-agent-k8s-operator

This charmed operator automates the operational procedures of running Grafana Agent, an open-soruce telemetry collector.
https://charmhub.io/grafana-agent-k8s
Apache License 2.0
8 stars 18 forks source link

Wrong juju_topology in alert rule provided via `LogProxyConsumer` #275

Closed rgildein closed 6 months ago

rgildein commented 7 months ago

Bug Description

We are working on tempest-k8s charm, which is using the LogProxyConsumer for logs forwarding and the alert rule. If we do relation grafana-agent-k8s:logging-provider tempest-k8s:logging the alert rule, define in the LogProxyConsumer will have juju_topology from grafana-agent-k8s and not the tempest-k8s charm.

Example from my local deployment. Screenshot from 2024-02-02 13-28-46

To Reproduce

juju deploy tempetes-k8s # manually build juju deploy grafana-agent-k8s

juju consume u1-k8s:cos.prometheus-receive-remote-write juju consume u1-k8s:cos.grafana-dashboard juju consume u1-k8s:cos.loki-logging juju relate grafana-agent-k8s:grafana-dashboards-provider grafana-dashboard:grafana-dashboard juju relate grafana-agent-k8s:logging-consumer loki-logging:logging juju relate grafana-agent-k8s:send-remote-write prometheus-receive-remote-write:receive-remote-write

juju relate grafana-agent-k8s:logging-provider tempest-k8s:logging

Environment

tempest model:

cos model:

Relevant log output

no logs related with this were found

Additional context

No response

sed-i commented 6 months ago

Thanks @rgildein for reporting this.

I had some difficulties reproducing this:

unit-tmpst-0: 14:19:24.620 ERROR juju.worker.uniter.operation hook "upgrade-charm" (via hook dispatching script: dispatch) failed: exit status 1
unit-tmpst-0: 14:24:24.777 WARNING unit.tmpst/0.upgrade-charm Traceback (most recent call last):
unit-tmpst-0: 14:24:24.777 WARNING unit.tmpst/0.upgrade-charm   File "/var/lib/juju/agents/unit-tmpst-0/charm/./src/charm.py", line 32, in <module>
unit-tmpst-0: 14:24:24.777 WARNING unit.tmpst/0.upgrade-charm     import ops_sunbeam.charm as sunbeam_charm
unit-tmpst-0: 14:24:24.777 WARNING unit.tmpst/0.upgrade-charm ModuleNotFoundError: No module named 'ops_sunbeam'

Would you be able to point me to a commit where I can pack tempest from?

sed-i commented 6 months ago

In the meanwhile I'm trying to reproduce with:

graph LR
tempo ---|logging| grafana-agent ---|logging| loki
bundle: kubernetes
applications:
  ga:
    charm: grafana-agent-k8s
    channel: edge
    scale: 1
  loki:
    charm: loki-k8s
    channel: edge
    scale: 1
    trust: true
  tempo:
    charm: tempo-k8s
    channel: edge
    scale: 1
relations:
- - ga:logging-consumer
  - loki:logging
- - tempo:logging
  - ga:logging-provider

Promtail config has the correct labels:

$ juju ssh --container tempo tempo/0 cat /etc/promtail/promtail_config.yaml
# ...
  - labels:
      __path__: /var/log/tempo.log
      job: juju_welcome-k8s_bf35e241_tempo
      juju_application: tempo
      juju_charm: tempo-k8s
      juju_model: welcome-k8s
      juju_model_uuid: bf35e241-d1b5-4613-810a-759a7065d3ca
      juju_unit: tempo/0

And those same labels reach loki:

$ curl 10.1.166.102:3100/loki/api/v1/labels
{"status":"success","data":["filename","job","juju_application","juju_charm","juju_model","juju_model_uuid","juju_unit"]}

$ curl 10.1.166.102:3100/loki/api/v1/label/juju_application/values
{"status":"success","data":["tempo"]}

$ curl 10.1.166.102:3100/loki/api/v1/label/juju_unit/values
{"status":"success","data":["tempo/0"]}

If I manually add a line to the tempo log file,

$ juju ssh --container tempo tempo/0 "echo bloop >> /var/log/tempo.log"

then the series would appear in loki with the correct labels:

$ curl 10.1.166.102:3100/loki/api/v1/series                       
{"status":"success","data":[{"filename":"/var/log/tempo.log","juju_unit":"tempo/0","juju_model_uuid":"bf35e241-d1b5-4613-810a-759a7065d3ca","juju_model":"welcome-k8s","juju_charm":"tempo-k8s","juju_application":"tempo","job":"juju_welcome-k8s_bf35e241_tempo"}]}
sed-i commented 6 months ago

Related: https://github.com/canonical/cos-tool/issues/13

sed-i commented 6 months ago

Issue confirmed with:

graph LR
traefik ---|logging| grafana-agent ---|logging| loki
traefik ---|metrics-endpoint| grafana-agent ---|metrics-endpoint| prometheus
  1. Traefik has simplified alert rule format with %juju_topology% placeholder.
  2. When forwarded to grafana-agent relation data, %juju_topology% is replaced with {job=~\".+\"}.
  3. When in turn forwarded to loki, it gets grafana-agent labels.

Similar thing with prometheus rules coming from traefik via grafana-agent.

rgildein commented 6 months ago

Thanks @rgildein for reporting this.

I had some difficulties reproducing this:

* The path [provided](https://opendev.org/openstack/sunbeam-charms/src/commit/8dc3cdff4cf75e455357617cf2416fd0496e480a/charms/tempest-k8s/src/handlers.py#L511) to LogProxyConsumer, `alert_rules_path="src/loki_alert_rules",` does not exist in the charm dir, so the relevant relation data is empty.

* I added a dummy rules file but upgrade failed:
unit-tmpst-0: 14:19:24.620 ERROR juju.worker.uniter.operation hook "upgrade-charm" (via hook dispatching script: dispatch) failed: exit status 1
unit-tmpst-0: 14:24:24.777 WARNING unit.tmpst/0.upgrade-charm Traceback (most recent call last):
unit-tmpst-0: 14:24:24.777 WARNING unit.tmpst/0.upgrade-charm   File "/var/lib/juju/agents/unit-tmpst-0/charm/./src/charm.py", line 32, in <module>
unit-tmpst-0: 14:24:24.777 WARNING unit.tmpst/0.upgrade-charm     import ops_sunbeam.charm as sunbeam_charm
unit-tmpst-0: 14:24:24.777 WARNING unit.tmpst/0.upgrade-charm ModuleNotFoundError: No module named 'ops_sunbeam'

Would you be able to point me to a commit where I can pack tempest from?

You need to build charm with tox -e build -- tempest-k8s and deploy it with juju deploy ./charms/tempest-k8s/tempest-k8s.charm --resource tempest-image=ghcr.io/canonical/tempest:2023.2.