canonical / grafana-k8s-operator

This charmed operator automates the operational procedures of running Grafana, an open-source visualization toolkit, on Kubernetes.
https://charmhub.io/grafana-k8s
Apache License 2.0
6 stars 23 forks source link

Duplicate dashboards remove write access to database #342

Open cbartz opened 3 days ago

cbartz commented 3 days ago

Bug Description

We have a deployment where a grafana-agent and a grafana-agent-k8s application are integrated with grafana:grafana-dashboard . Both provide dashboards with the title "System Resources" (https://github.com/canonical/grafana-agent-operator/blob/6eb1938895f5c31704917832d7d84ca9afaed799/src/grafana_dashboards/node-exporter-full.json#L23196 and https://github.com/canonical/grafana-agent-operator/blob/d77e500625a75318f16eff5fad7bb8878da141fc/src/grafana_dashboards/node-exporter-full.json#L23196). This leads to warnings like

2024-07-04T09:05:19.197Z [grafana] logger=provisioning.dashboard t=2024-07-04T09:05:19.197743344Z level=warn msg="dashboard title is not unique in folder" orgId=1 title="System Resources" folderID=0 times=2 providers=[Default]
2024-07-04T09:05:19.197Z [grafana] logger=provisioning.dashboard t=2024-07-04T09:05:19.197764885Z level=warn msg="dashboards provisioning provider has no database write permissions because of duplicates" provider=Default orgId=1
2024-07-04T09:05:19.223Z [grafana] logger=provisioning.dashboard type=file name=Default t=2024-07-04T09:05:19.223731211Z level=warn msg="Not saving new dashboard due to restricted database access" provisioner=Default file=/etc/grafana/provisioning/dashboards/juju_grafana-agent-k8s_0def0c2.json folderId=0

and the provisioning of some other dashboards failed in our production environment (they did not appear in Grafana). Grafana seems to remove write access to the database in this case: https://github.com/grafana/grafana/issues/43530

Besides this, we also have an application deployed with three units, which provides two times the same dashboard in the relational data, which leads to

2024-07-04T09:05:19.197Z [grafana] logger=provisioning.dashboard t=2024-07-04T09:05:19.197753914Z level=warn msg="dashboard title is not unique in folder" orgId=1 title="Synapse Operator" folderID=0 times=2 providers=[Default]

A juju show unit grafana/0 gives:

 "37": [{"id": "grafana-dashboard:37/file:synapse", "original_id": "file:synapse"

        "juju_topology": {"model": "prod-synapse-k8s", "model_uuid": "cf54a174-d8f9-4a6a-8e43-2a17d555c60b",
        "application": "synapse", "unit": "synapse/0"}

and

 "110:" 
        "grafana-dashboard:110/file:synapse", "original_id": "file:synapse", "content":
        "juju_topology": {"model": "prod-synapse-k8s", "model_uuid": "cf54a174-d8f9-4a6a-8e43-2a17d555c60b",
        "application": "synapse", "unit": "synapse/1"}, "inject_dropdowns": true,
        "dashboard_alt_uid": "528989afbcc43cea"}, "valid": true, "error": null}],

To Reproduce

Deploy a grafana instance and relate two different grafana agent deployments to. This should produce the log warnings.

Environment

grafana-k8s latest/edge rev 112 , juju 3.1.8

Relevant log output

2024-07-04T09:05:19.197Z [grafana] logger=provisioning.dashboard t=2024-07-04T09:05:19.197743344Z level=warn msg="dashboard title is not unique in folder" orgId=1 title="System Resources" folderID=0 times=2 providers=[Default]
2024-07-04T09:05:19.197Z [grafana] logger=provisioning.dashboard t=2024-07-04T09:05:19.197753914Z level=warn msg="dashboard title is not unique in folder" orgId=1 title="Synapse Operator" folderID=0 times=2 providers=[Default]
2024-07-04T09:05:19.197Z [grafana] logger=provisioning.dashboard t=2024-07-04T09:05:19.197764885Z level=warn msg="dashboards provisioning provider has no database write permissions because of duplicates" provider=Default orgId=1
2024-07-04T09:05:19.223Z [grafana] logger=provisioning.dashboard type=file name=Default t=2024-07-04T09:05:19.223731211Z level=warn msg="Not saving new dashboard due to restricted database access" provisioner=Default file=/etc/grafana/provisioning/dashboards/juju_grafana-agent-k8s_0def0c2.json folderId=0
2024-07-04T09:05:19.225Z [grafana] logger=provisioning.dashboard type=file name=Default t=2024-07-04T09:05:19.225705277Z level=warn msg="Not saving new dashboard due to restricted database access" provisioner=Default file=/etc/grafana/provisioning/dashboards/juju_grafana-agent_6a995b8.json folderId=0
2024-07-04T09:05:19.227Z [grafana] logger=provisioning.dashboard type=file name=Default t=2024-07-04T09:05:19.227262179Z level=warn msg="Not saving new dashboard due to restricted database access" provisioner=Default file=/etc/grafana/provisioning/dashboards/juju_grafana-agent_a25da1b.json folderId=0
2024-07-04T09:05:19.236Z [grafana] logger=provisioning.dashboard type=file name=Default t=2024-07-04T09:05:19.236773711Z level=warn msg="Not saving new dashboard due to restricted database access" provisioner=Default file=/etc/grafana/provisioning/dashboards/juju_grafana-agent-k8s_276932e.json folderId=0
2024-07-04T09:05:19.242Z [grafana] logger=provisioning.dashboard type=file name=Default t=2024-07-04T09:05:19.242342984Z level=warn msg="Not saving new dashboard due to restricted database access" provisioner=Default file=/etc/grafana/provisioning/dashboards/juju_synapse_9fbd610.json folderId=0

Additional context

No response

sed-i commented 3 days ago

Thanks for the detailed report @cbartz! In the upstream issue you linked it was said that both uid and title must be unique within a folder, but currently, as you point out, both name and uid of the System Resources dashboard are the same.

Currently, we dedupe the dashboard filename, but nothing else.

The thing is, that the we should in fact only have one System Resources dashboard on grafana's filesystem: we do not want the same dashboard to appear multiple times.

We should try to reproduce to see if a revision bump doesn't help. If one charm gets a dashboard update (and a revision bump), then we'd have two dashboards that differ in content but not in uid/title.

For example:

graph LR

ga1["agent1 (grafana-agent-k8s)"] --- grafana
ga2["agent2 (grafana-agent-k8s)"] --- grafana
ga3["agent3 (grafana-agent)"] --- grafana
  1. Check if two dashboards with different revision numbers still cause this error.
  2. Come up with a proposal for how to deal with "dashboard forking" from charm to charm (loki uses node exporter x and gagent node exporter y, with a slightly different set of metrics; or dashboard update across an upgrade). One option is for grafana to only keep the latest revision.