cncf-tags / green-reviews-tooling

Project Repository for the WG Green Reviews which is part of the CNCF TAG Environmental Sustainability
https://github.com/cncf/tag-env-sustainability/tree/main/working-groups/green-reviews
Apache License 2.0
25 stars 14 forks source link

[Tracking] Collect & visualise sustainability-related metrics #20

Open nikimanoledaki opened 10 months ago

nikimanoledaki commented 10 months ago

This issue aims to investigate the sustainability-related metrics that could be implemented as part of our reference architecture.

The WG has so far identified the following use cases that each require a slightly different set of metrics:

SRE Metrics

Metrics used by CNCF project maintainers to make improvements at the application level. For example, as mentioned by @incertum in the issue linked before: Falco's own internal metrics (CPU, memory, and counters), traditional SRE metrics (CPU/mem usage), and energy metrics.

More information about this can be found in the Metrics section of the Green Reviews design document.

Sustainability Metrics

Other emerging indices that can be used to assess an application's sustainability footprint may also be considered in the future.

Benchmark-Specific Metrics

Metrics to setup the benchmark tests for each CNCF Project.


These metrics are often inter-related. For example, data about energy consumption can be used in each of these scenarios.

This issue can be used to track the ideas and discussions for which metrics the Green Reviews pipeline should track. That being said, prioritisation is key so that the WG remains on track with the milestones that were set in the Roadmap by the group.

nikimanoledaki commented 9 months ago

Looking at SRE Metrics, @incertum, do you already have a Grafana dashboard for these metrics? We would need to either create Prometheus queries or access them through the Falco internal metrics.

incertum commented 9 months ago

@nikimanoledaki Falco does not yet have a Prometheus exporter, perhaps for Falco 0.38 in May we may have it, I need to check with the other maintainers. Meanwhile, we have Falco metrics as internal Falco rules that can be piped to logrotated files (JSONL formatted).

Proposing to make the CNCF SRE Metrics independent of Falco or Falco's Metrics and report CPU and memory usages of project binaries through your preferred framework as well as creating your preferred Grafana dashboards. WDYT?

nikimanoledaki commented 9 months ago

I wonder if there are any useful metrics in the default metrics of Kubernetes, for example:

It would be nice to somehow surface the internal Falco metrics that way, but I'm not sure if that would be possible since those would be logs, not metrics.

What is the filesystem location where the internal Falco metrics are exported? These metrics are at the Pod level, correct?

Which Falco Metrics would you find useful or relevant for either 1) performance monitoring or 2) setting up the benchmark tests?

Looking at this, I imagine "kernel.evt_rate" is one that we would definitely need for the benchmark tests.

AntonioDiTuri commented 9 months ago

I created two deep-dive ticket on the steps to collect the metrics and visualize them. I made a distinction between Kepler and Kubernetes related metrics which have a more standard approach and Falco that needs some more thought on the process, hope that it is clear, please let me know