Closed natalytvinova closed 9 months ago
I'm hitting the same issue I beleive, juju debug-log
shows:
unit-loki-0: 10:26:31 ERROR unit.loki/0.juju-log certificates:32: Checking alert rules: 400 - Bad Request
It looks like the alert rules aren't even validated and the charm code fails to obtain them.
I also use the same overlay https://github.com/canonical/cos-lite-bundle/blob/main/overlays/tls-overlay.yaml and I hit the issue after relating grafana-agents from other models to Loki.
juju debug-log
output: loki_debug_log.log
COS bundle:
---
bundle: kubernetes
applications:
traefik:
charm: /home/ubuntu/deployment/example/charms/cos/traefik-k8s_r129.charm
scale: 1
trust: true
channel: null
resources:
traefik-image: ghcr.io/canonical/traefik:2.10.4
alertmanager:
charm: /home/ubuntu/deployment/example/charms/cos/alertmanager-k8s_r77.charm
scale: 1
trust: true
channel: null
resources:
alertmanager-image: docker.io/ubuntu/prometheus-alertmanager:latest
prometheus:
charm: /home/ubuntu/deployment/example/charms/cos/prometheus-k8s_r129.charm
scale: 1
trust: true
channel: null
resources:
prometheus-image: ghcr.io/canonical/prometheus:2.46.0
options:
metrics_retention_time: 62d
grafana:
charm: /home/ubuntu/deployment/example/charms/cos/grafana-k8s_r82.charm
scale: 1
trust: true
channel: null
resources:
grafana-image: docker.io/ubuntu/grafana:latest
litestream-image: docker.io/litestream/litestream:latest
catalogue:
charm: /home/ubuntu/deployment/example/charms/cos/catalogue-k8s_r19.charm
scale: 1
trust: true
channel: null
resources:
catalogue-image: ghcr.io/canonical/catalogue-k8s-operator:latest
options:
title: Example prod2or Canonical Observability Stack
tagline: Model-driven Observability Stack deployed with a single command.
description: |
Canonical Observability Stack Lite, or COS Lite, is a light-weight, highly-integrated,
Juju-based observability suite running on Kubernetes.
loki:
charm: /home/ubuntu/deployment/example/charms/cos/loki-k8s_r91.charm
scale: 1
trust: true
channel: null
resources:
loki-image: ghcr.io/canonical/loki:2.7.4
relations:
- [traefik:ingress-per-unit, prometheus:ingress]
- [traefik:ingress-per-unit, loki:ingress]
- [traefik:traefik-route, grafana:ingress]
- [traefik:ingress, alertmanager:ingress]
- [prometheus:alertmanager, alertmanager:alerting]
- [grafana:grafana-source, prometheus:grafana-source]
- [grafana:grafana-source, loki:grafana-source]
- [grafana:grafana-source, alertmanager:grafana-source]
- [loki:alertmanager, alertmanager:alerting]
# COS-monitoring
- [prometheus:metrics-endpoint, traefik:metrics-endpoint]
- [prometheus:metrics-endpoint, alertmanager:self-metrics-endpoint]
- [prometheus:metrics-endpoint, loki:metrics-endpoint]
- [prometheus:metrics-endpoint, grafana:metrics-endpoint]
- [grafana:grafana-dashboard, loki:grafana-dashboard]
- [grafana:grafana-dashboard, prometheus:grafana-dashboard]
- [grafana:grafana-dashboard, alertmanager:grafana-dashboard]
# Service Catalogue
- [catalogue:ingress, traefik:ingress]
- [catalogue:catalogue, grafana:catalogue]
- [catalogue:catalogue, prometheus:catalogue]
- [catalogue:catalogue, alertmanager:catalogue]
Offers:
applications:
alertmanager:
offers:
alertmanager:
endpoints:
- karma-dashboard
grafana:
offers:
grafana:
endpoints:
- grafana-dashboard
loki:
offers:
loki:
endpoints:
- logging
prometheus:
offers:
prometheus:
endpoints:
- metrics-endpoint
- receive-remote-write
TLS:
applications:
ca:
charm: self-signed-certificates
channel: edge
scale: 1
options:
ca-common-name: traefik-0.traefik-endpoints.cos.svc.cluster.local
external-ca:
# This charm needs to be replaced with a real CA charm.
# Use `juju refresh --switch` to replace via a "crossgrade refresh".
charm: self-signed-certificates
channel: edge
scale: 1
options:
#ca-common-name: external-ca.example.com
ca-common-name: traefik-0.traefik-endpoints.cos.svc.cluster.local
relations:
# This is a more general CA (e.g. root CA) that signs traefik's own CSR.
- [external-ca, traefik:certificates]
# This is the local CA that signs CSRs from COS charms (excluding traefik).
# Traefik is trusting this CA so that it could load balance via TLS.
- [ca, traefik:receive-ca-cert]
- [ca, alertmanager:certificates]
- [ca, prometheus:certificates]
- [ca, grafana:certificates]
- [ca, loki:certificates]
- [ca, catalogue:certificates]
Options overlay:
bundle: kubernetes
applications:
scrape-interval-config:
channel: null
charm: /home/ubuntu/deployment/example/charms/cos/prometheus-scrape-config-k8s_r39.charm
scale: 1
trust: true
options:
scrape_timeout: 30s
scrape_interval: 5m
offers:
scrape-interval-config:
endpoints:
- configurable-scrape-jobs
relations:
- [ scrape-interval-config:metrics-endpoint, prometheus:metrics-endpoint]
COS Relations in Openstack:
- ['cos-grafana:grafana-dashboard', 'cos-proxy:downstream-grafana-dashboard']
- ['cos-loki:logging', 'cos-proxy:downstream-logging']
- ['cos-prometheus:metrics-endpoint', 'cos-proxy:downstream-prometheus-scrape']
- ['cos-proxy:dashboards', 'etcd:grafana']
- ['cos-proxy:dashboards', 'prometheus-grok-exporter:dashboards']
- ['cos-proxy:dashboards', 'prometheus-openstack-exporter:dashboards']
- ['cos-proxy:dashboards', 'telegraf:dashboards']
- ['cos-proxy:filebeat', 'filebeat:logstash']
- ['cos-proxy:juju-info', 'filebeat:beats-host']
- ['cos-proxy:juju-info', 'landscape-client:container']
- ['cos-proxy:juju-info', 'nrpe:general-info']
- ['cos-proxy:juju-info', 'prometheus-grok-exporter:juju-info']
- ['cos-proxy:juju-info', 'telegraf:juju-info']
- ['cos-proxy:juju-info', 'ubuntu-advantage:juju-info']
- ['cos-proxy:monitors', 'nrpe:monitors']
- ['cos-proxy:prometheus-rules', 'telegraf:prometheus-rules']
- ['cos-proxy:prometheus-target', 'telegraf:prometheus-client']
Hi team, which channel and revision contains that fix? I'm using latest/stable right now and facing this issue
The fix was in revision 117: https://github.com/canonical/loki-k8s-operator/releases/tag/rev117
It's currently in latest/candidate, latest/beta and latest/edge.
Thanks @mmkay !
Bug Description
Loki from latest/stable goes into this state
Errors in alert rule groups. Check juju debug-log
after enabling TLS overlay. Without the overlay, the charm doesn't go into this state.To Reproduce
Environment
Juju is run locally on 3 infra nodes with version 3.1.7, Loki is on the latest/stable rev 105. Microk8s version: channel: 1.28/stable, charm latest/stable
Relevant log output
Here is juju show-unit loki/0 show-unit-loki.log Here are the alert rules from the charm that are in place alert-rules.txt