Kuadrant / kuadrant-operator

The Operator to install and manage the lifecycle of the Kuadrant components deployments.
Apache License 2.0
37 stars 33 forks source link

Design and implement a DNSPolicy focused dashboard for Grafana #649

Open maleck13 opened 6 months ago

maleck13 commented 6 months ago

What

Blocked by: https://github.com/Kuadrant/dns-operator/issues/206

Design and create an DNSPolicy centric dashboard that allows us to step through to the components responsible for that API ultimate enforcement. IE for DNSPolicy

Open question here is how we approach the cluster dimension in the views. We also need to experiment with what we want to show there. Idea is to iterate and come up with something useful to help drive ideas and requirements for not only this but other APIs

part of the outcomes here is how we represent the originating object (the GVK and Namespace Name) . We may want some common structure of meta data that used in both logs and metrics to represent where the information originated.

Some useful starting thoughts:

Verifying dashboard is useful:

maleck13 commented 6 months ago

@david-martin any thoughts on how we would approach the cluster dimension of these metrics? IE would want possibly to have a drop down to step through each cluster and see the DNSPolicies on each cluster and then step through to the dashboards for Kuadrant operator and DNSOperator on that cluster.

david-martin commented 6 months ago

Starting from what's there now, we have a dashboard with a table showing policies, like this:

image

I see that view as singular, whether you have 1 cluster or many. Metrics from all clusters go to a central thanos and used in a central grafana. If a policy with the same name and namespace exists in more than 1 cluster, it shows up more than once in that table with no distinguishing identifier for the cluster.

Suggestions how to improve the ux here:

A dropdown for cluster can also be included on this dashboard to allow filtering by cluster.

Side note I've thought about aggregating policies with the same name & namespace and showing that aggregation is some smart way in that dashboard (e.g. instead of showing 5 policies from 5 clusters, just show 1 and say it's on 5 clusters) but I haven't come up with anything that makes sense yet. So I'm inclined to not do that at this time.

Building on the suggested changes above, a link can be added from an individual policy to any other dashboard we want. Parameters can be set in these links so that dropdowns get pre-popluated if that helps the ux while moving from 1 dashboard to another. For example, I click on DNSPolicy called 'mypolicy', which links to some new dns operator dashboard that's prefiltered to a view of stuff relevant to that DNSPolicy (if that makes sense).

This relies on all dashboards existing in the same grafana instance in order for linking to work. That is, there is 1 central grafana where all dashboards are deployed. I think this is fine and can work for single cluster too. In the case of single cluster, the cluster identifier can be empty (and so not actually filter anything) or use some single cluster identifier. I don't see a major problem here. There important thing from a maintenance point of view IMO is having 1 set of dashboards.

The cluster identifier dropdown can be extended to any and all dashboards we want as well. So a dns operator dashboard could be filtered by cluster centrally.

philbrookes commented 2 months ago

@david-martin could you share any notes on how to create a local-setup with grafana and how to develop a grafana dashboard locally.

david-martin commented 2 months ago

@philbrookes https://github.com/Kuadrant/kuadrant-operator/tree/main/config/observability#observability-stack-guide Let me know if there's something else I can help with