canonical / loki-k8s-operator

https://charmhub.io/loki-k8s
Apache License 2.0
10 stars 16 forks source link

Alerts have wrong source urls (GeneratorURL) #437

Open cbartz opened 4 months ago

cbartz commented 4 months ago

Bug Description

We have received alert notifications from the alertmanager via Loki alert rules. We use the GeneratorURL in the alert notification template. Unfortunately, the generated links are wrong (they use the Loki external_url instead of the grafana external_url, which is wrong. For example, this is an example of the wrong link we received, and this is the correct link using the grafana external url.

# wrong link
https://cos-ps6.is-devops.canonical.com/prod-cos-k8s-ps6-is-charms-loki-0/explore?left=%7B%22queries%22:%5B%7B%22expr%22%3A%22%28sum_over_time%28%7Bfilename%3D%5C%22%2Fvar%2Flog%2Fgithub-runner-metrics.log%5C%22%2C+juju_application%3D%5C%22grafana-agent%5C%22%2C+juju_charm%3D%5C%22grafana-agent%5C%22%2C+juju_model%3D%5C%22prod-github-runner-manager-ps6%5C%22%2C+juju_model_uuid%3D%5C%2230ee4f9c-efca-4ef6-85e0-11bd4ec0f1aa%5C%22%7D+%7C+json+event%3D%5C%22event%5C%22%2Ccrashed_runners%3D%5C%22crashed_runners%5C%22+%7C+event%3D%5C%22reconciliation%5C%22+%7C+unwrap+crashed_runners%5B1h%5D%29+%5Cu003e+0%29%22%2C%22queryType%22%3A%22range%22%7D%5D%7D
# correct link
https://cos-ps6.is-devops.canonical.com/prod-cos-k8s-ps6-is-charms-grafana/explore?left=%7B%22datasource%22:%22P30805665297C0350%22,%22queries%22:%5B%7B%22expr%22:%22%28sum_over_time%28%7Bfilename%3D%5C%22%2Fvar%2Flog%2Fgithub-runner-metrics.log%5C%22,%20juju_application%3D%5C%22grafana-agent%5C%22,%20juju_charm%3D%5C%22grafana-agent%5C%22,%20juju_model%3D%5C%22prod-github-runner-manager-ps6%5C%22,%20juju_model_uuid%3D%5C%2230ee4f9c-efca-4ef6-85e0-11bd4ec0f1aa%5C%22%7D%20%7C%20json%20event%3D%5C%22event%5C%22,crashed_runners%3D%5C%22crashed_runners%5C%22%20%7C%20event%3D%5C%22reconciliation%5C%22%20%7C%20unwrap%20crashed_runners%5B1h%5D%29%20%3E%200%29%22,%22queryType%22:%22range%22,%22refId%22:%22A%22%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D&orgId=1

The reason seems to be that the configuration is written wrongly by the charm, the external_url in the ruler block should be set to the grafana external url, not the loki one, as the charm does in https://github.com/canonical/loki-k8s-operator/blob/86fc551af60736ea407b211cc60a3bad56d88c06/src/charm.py#L394 and https://github.com/canonical/loki-k8s-operator/blob/86fc551af60736ea407b211cc60a3bad56d88c06/src/config_builder.py#L112

See https://grafana.com/docs/loki/v2.9.x/configure/#ruler and https://github.com/grafana/loki/pull/8500.

To fix this, the charm would probably need to detect if there is an active grafana deployment (e.g. via integration) and not set the external_url otherwise (if this is possible).

To Reproduce

Deploy Loki, Alertmanager and integrate both with a triggering alert rule. Use the .GeneratorURL in the notification template of a recipient configuration in the Alertmanager, e.g. as follows

- name: 'mattermost-notifications'
  slack_configs:
    - send_resolved: true
      api_url: ...
      title: '[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }}  {{- if gt (len .CommonLabels) (len .GroupLabels) -}} {{" "}}( {{- with .CommonLabels.Remove .GroupLabels.Names }}  {{- range $index, $label := .SortedPairs -}}    {{ if $index }}, {{ end }}    {{- $label.Name }}="{{ $label.Value -}}"  {{- end }} {{- end -}} ) {{- end }}'
      text: |
        {{ range .Alerts -}}
        *Alert:* {{ .Annotations.summary }}{{ if .Labels.severity }} - `{{ .Labels.severity }}`{{ end }}

        *Description:* {{ .Annotations.description }}

        *Details:*
          {{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
          {{ end }}

        *Source*:  {{ .GeneratorURL }}
        {{ end }}

Trigger the alert and notice that the links for Source do not work because they use the external_url of Loki instead of that of Grafana.

Environment

juju version 3.1.8, loki rev 2.9.5, alertmanager rev 107

Relevant log output

not relevant

Additional context

No response

lucabello commented 2 months ago

We should modify the GrafanaSourceRequirer databag to include the Grafana external_url, so that both Loki and Prometheus will receive that.

We can then set that url in the external_url section of the ruler config (code here) (docs).