giantswarm / roadmap

Giant Swarm Product Roadmap
https://github.com/orgs/giantswarm/projects/273
Apache License 2.0
3 stars 0 forks source link

Consistent information model #3395

Open marians opened 3 months ago

marians commented 3 months ago

This issue is an early attempt to explain a challenge we have. It yet has to become more defined to make it a meaningful task description / roadmap item.

Summary

How can we enable customers to find information throughout our platform, filtered down to exactly the depth and detail level they need?

"Information" can mean a lot of things. Examples: metrics and dashboards, log entries, events, Kyverno policy reports, Falco reports, incidents, ...

For example, if a customer wants log entries and metrics about a certain application, how do they select for exactly this application in Grafana? What if they want Kyverno policy reports only for that application? What if they want to look at only one application in the Hubble UI (cilium)

Or, imagine a team on the customer side wants to find all policy reports generated for its workloads. How can they drill down to policy reports only for their team?

marians commented 3 months ago

I have talked about this problem with @teemow on Friday. He suggested to discuss it with team Atlas.

I am also working on slides to explain the problem more easily.

teemow commented 3 months ago

Btw I've looked into the grafana cloud metrics* and what I found is:

There are customer/installation labels but not for apps/teams. It's also not easy to add the labels to the prometheus_rules aggregation[1] as there are no such labels in the raw metrics as well. There is an app label but it only contains cadvisor . I don't know how we can make sure to add the right meta data. Would be good if Honey badger and Atlas could figure out.

/cc @piontec @weatherhog @Rotfuks

*The reason why I checked grafana cloud was to build a dashboard that shows the CPU/Memory footprint of apps per team across all installations.

[1] https://github.com/giantswarm/prometheus-rules/blob/f7c2bf17119715afea7e5427970e3a484cd0266a/helm/prometheus-rules/templates/recording-rules/grafana-cloud.rules.yml

teemow commented 3 months ago

We have team labels in charts afaik. The question is how we can propagate this data to the metrics in prometheus.

https://github.com/giantswarm/giantswarm/issues/20958

Rotfuks commented 3 months ago

I'm happy to talk with you about how this can fit in our observability platform strategy and what's the idea on our side to address this @marians