canonical / grafana-k8s-operator

This charmed operator automates the operational procedures of running Grafana, an open-source visualization toolkit, on Kubernetes.
https://charmhub.io/grafana-k8s
Apache License 2.0
6 stars 23 forks source link

Dashboard forwarding from cos-configuration-k8s can be unreliable #312

Open Batalex opened 4 months ago

Batalex commented 4 months ago

Bug Description

I have an issue with custom dashboards from cos-configuration-k8s not appearing in the grafana interface.

I managed to pinpoint the source of the issue to this charm because the custom dashboards are present in the relation databag, as well as in the grafana container.

juju ssh --container grafana grafana/0 ls -1 /etc/grafana/provisioning/dashboards

default.yaml
juju_alertmanager-k8s_e9224b0.json
juju_cos-configuration-k8s_043a2b3.json
juju_cos-configuration-k8s_af3132d.json
juju_grafana-agent_0def0c2.json
juju_grafana-agent_6545430.json
juju_grafana-agent_ab32508.json
juju_grafana-agent_feefa09.json
juju_loki-k8s_0804127.json
juju_prometheus-k8s_35dd368.json
self_dashboard.json

See that two cos-config files are present in the output above, but they do not appear in grafana.

I can sometimes address the issue by scaling up and down grafana, but this operation is not a sure fix

To Reproduce

I have not been able to find a way to consistently reproduce the issue. However, in all case, I would have multiple grafana agents related to the monitoring stack.

COS - juju export bundle ```yaml bundle: kubernetes saas: remote-8ae57c5a420b4e8c889fd8eba6c28be9: {} remote-57789c2419f64cb8874a0822ebaa787b: {} applications: alertmanager: charm: alertmanager-k8s channel: stable revision: 101 resources: alertmanager-image: 87 scale: 1 constraints: arch=amd64 storage: data: kubernetes,1,2048M trust: true catalogue: charm: catalogue-k8s channel: stable revision: 33 resources: catalogue-image: 32 scale: 1 options: description: "Canonical Observability Stack Lite, or COS Lite, is a light-weight, highly-integrated, \nJuju-based observability suite running on Kubernetes.\n" tagline: Model-driven Observability Stack deployed with a single command. title: Canonical Observability Stack constraints: arch=amd64 trust: true cos-configuration-k8s: charm: cos-configuration-k8s channel: stable revision: 45 resources: git-sync-image: 32 scale: 1 options: git_branch: main git_repo: https://github.com/batalex/cos-rules grafana_dashboards_path: grafana/dashboards/ prometheus_alert_rules_path: rules/ constraints: arch=amd64 storage: content-from-git: kubernetes,1,1024M trust: true grafana: charm: grafana-k8s channel: stable revision: 105 resources: grafana-image: 68 litestream-image: 43 scale: 1 constraints: arch=amd64 storage: database: kubernetes,1,2048M trust: true loki: charm: loki-k8s channel: stable revision: 118 resources: loki-image: 91 scale: 1 constraints: arch=amd64 storage: active-index-directory: kubernetes,1,2048M loki-chunks: kubernetes,1,10240M trust: true prometheus: charm: prometheus-k8s channel: stable revision: 170 resources: prometheus-image: 139 scale: 1 constraints: arch=amd64 storage: database: kubernetes,1,10240M trust: true traefik: charm: traefik-k8s channel: stable revision: 169 resources: traefik-image: 158 scale: 1 constraints: arch=amd64 storage: configurations: kubernetes,1,1024M trust: true relations: - - traefik:ingress-per-unit - prometheus:ingress - - traefik:ingress-per-unit - loki:ingress - - traefik:traefik-route - grafana:ingress - - traefik:ingress - alertmanager:ingress - - prometheus:alertmanager - alertmanager:alerting - - grafana:grafana-source - prometheus:grafana-source - - grafana:grafana-source - loki:grafana-source - - grafana:grafana-source - alertmanager:grafana-source - - loki:alertmanager - alertmanager:alerting - - prometheus:metrics-endpoint - traefik:metrics-endpoint - - prometheus:metrics-endpoint - alertmanager:self-metrics-endpoint - - prometheus:metrics-endpoint - loki:metrics-endpoint - - prometheus:metrics-endpoint - grafana:metrics-endpoint - - grafana:grafana-dashboard - loki:grafana-dashboard - - grafana:grafana-dashboard - prometheus:grafana-dashboard - - grafana:grafana-dashboard - alertmanager:grafana-dashboard - - catalogue:ingress - traefik:ingress - - catalogue:catalogue - grafana:catalogue - - catalogue:catalogue - prometheus:catalogue - - catalogue:catalogue - alertmanager:catalogue - - grafana:grafana-dashboard - remote-57789c2419f64cb8874a0822ebaa787b:grafana-dashboards-provider - - loki:logging - remote-57789c2419f64cb8874a0822ebaa787b:logging-consumer - - prometheus:receive-remote-write - remote-57789c2419f64cb8874a0822ebaa787b:send-remote-write - - grafana:grafana-dashboard - remote-8ae57c5a420b4e8c889fd8eba6c28be9:grafana-dashboards-provider - - loki:logging - remote-8ae57c5a420b4e8c889fd8eba6c28be9:logging-consumer - - prometheus:receive-remote-write - remote-8ae57c5a420b4e8c889fd8eba6c28be9:send-remote-write - - cos-configuration-k8s:grafana-dashboards - grafana:grafana-dashboard - - cos-configuration-k8s:prometheus-config - prometheus:metrics-endpoint --- # overlay.yaml applications: alertmanager: offers: alertmanager-karma-dashboard: endpoints: - karma-dashboard acl: admin: admin grafana: offers: grafana-dashboards: endpoints: - grafana-dashboard acl: admin: admin loki: offers: loki-logging: endpoints: - logging acl: admin: admin prometheus: offers: prometheus-receive-remote-write: endpoints: - receive-remote-write acl: admin: admin ```

Environment

Relevant log output

unit-grafana-0: 14:44:51 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-grafana-0: 14:49:04 WARNING unit.grafana/0.juju-log <class '__main__.GrafanaCharm'>.<property object at 0x7f8f52db4090> returned None; continuing with tracing DISABLED.
unit-grafana-0: 14:49:05 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-grafana-0: 14:53:29 WARNING unit.grafana/0.juju-log <class '__main__.GrafanaCharm'>.<property object at 0x7fc7421142c0> returned None; continuing with tracing DISABLED.
unit-grafana-0: 14:53:29 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-grafana-0: 14:57:51 WARNING unit.grafana/0.juju-log <class '__main__.GrafanaCharm'>.<property object at 0x7fa9c039e220> returned None; continuing with tracing DISABLED.
unit-grafana-0: 14:57:52 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-grafana-0: 15:02:21 WARNING unit.grafana/0.juju-log <class '__main__.GrafanaCharm'>.<property object at 0x7fd1153ac2c0> returned None; continuing with

Additional context

No response

lucabello commented 1 month ago

We are probably missing a restart on that hook!