Green-Software-Foundation / hack

Carbon Hack 24 - The annual hackathon from the Green Software Foundation
https://grnsft.org/hack/github
14 stars 1 forks source link

k8s importer and visualization #84

Open adamaucamp opened 4 months ago

adamaucamp commented 4 months ago

Prize category

Best Plugin

Overview

Kubernetes has emerged as the Defacto container orchestration platform and is supported by all major CSP's.

We intend to build a k8s metrics importer that will import metrics from the k8s metrics-server and in near realtime we want to expose those metrics as Prometheus metrics to be scraped by Prometheus and subsequently visualised by Grafana. This will allow us to have carbon metrics at the same rate we get other operational metrics like utilization. Realtime carbon metrics on Prometheus open up many possibilities when it comes to alerts and setting up automations when certain thresholds are reached.

Questions to be answered

No response

Have you got a project team yet?

Yes and we aren't recruiting

Project team

@gholtzhausen, @kungelaxyz, @tshepotshabalala, @Njuraa, @ShanelUchee02, @adamaucamp, @Choogenhout

Terms of Participation


Project Submission

Summary:

Kubernetes (k8s) is the main container orchestration platform, supported by all major Cloud Service Providers (CSPs).

Our project revolves around importing real time metrics from Kubernetes, processing those metrics through a standard Impact Framework (IF) pipeline to derive the SCI (Software Carbon Intensity), and then exporting them as Prometheus metrics. These metrics are subsequently scraped by a Prometheus server and visualised in Grafana, appearing alongside regular performance metrics that we currently collect from Kubernetes.

This enables us to scale our application based on Carbon metrics, treating them with the same importance as e.g. CPU or memory. Thus, Carbon becomes a top priority when evaluating application performance and efficiency.

Problem:

Our solution addresses the pressing need for real time metric acquisition and visualisation, with a particular focus on data generated from the IF. Despite the widespread adoption of Kubernetes across our organization, Nedbank, we encountered a significant gap: the absence of emissions tracking capability within Kubernetes. Furthermore, within IF, the lack of real time emissions monitoring posed a challenge, as existing solutions did not offer this capability.

Aiming to quantify the environmental impact of software, we identified several underlying problems to solve. Firstly, we needed to procure real time usage metrics from k8s, which is our chosen compute platform. Thereafter, we implemented a system to export these metrics as Prometheus metrics and expose them on an endpoint accessible to a Prometheus server for scraping.

Leveraging the IF as an open-source tool, our solution aims to bridge these gaps and provide a comprehensive approach to measuring and visualising the environmental impact of software in real time.

Application:

The application consists of three key components:

These importer and exporter components are integrated into a standard IF file, which is executed when a Prometheus server scrapes the metrics from the server.

The application is designed to operate within a standard k8s cluster with the metrics-server installed. To streamline visualisation and scraping processes, we opted for the kube-prometheus stack, the go-to Prometheus installation for monitoring and visualizing Kubernetes metrics.

Additionally, the solution features a default dashboard that can be effortlessly imported into Grafana. This dashboard includes a pre-configured alert set to trigger when carbon usage (SCI) surpasses a specified threshold within any 5-minute window.

This setup ensures seamless integration, efficient monitoring, and proactive alerting, making it easier to manage and optimise our carbon footprint in real time.

Prize category:

Best Plugin

Judging Criteria

Overall Impact:

Our solution facilitates real time export and visualization of metrics via the IF -> Prometheus -> Grafana, enabling the export and calculation of any metric in real time. This shifts the focus on sustainability from being an afterthought to a proactive decision. Achieving this impact requires monitoring teams to incorporate IF metrics into their monitoring practices and be willing to share and publish the YAML files that contain the pipelines, which informs these real time metrics.

Opportunity:

While our chosen platform for the purposes of the hack was Kubernetes, our system can be applied to any metric for real time calculation and export. The IF’s composability allows any valid YAML file to add the Prometheus exporter, facilitating metric export and visualisation. We intentionally decomposed our implementation to encourage diverse usage scenarios in a myriad of ways, that don't necessarily include each other.

Modular:

Our implementation aligns well with the Unix philosophy and micro-model architecture. The k8s-importer and Prometheus-exporter plugins each perform a single task efficiently. The Express.js server seamlessly exposes Prometheus-formatted metrics from the IF, ensuring modularity and interoperability with other plugins. Each component can function independently or in conjunction with other official and unofficial plugins, promoting flexibility and integration within the Green Software Foundation (GSF) ecosystem.

Video

Youtube Link

Artefacts

The kubernetes importer plugin: nb-green-ops/if-k8s-metrics-importer (github.com)

The prometheus exporter plugin: nb-green-ops/if-prometheus-exporter (github.com)

The main "hack" application repo that brings it all together: nb-green-ops/carbon-hack-24 (github.com)

Usage

The main usage Readme. Located in the main repo: carbon-hack-24/README.md at main · nb-green-ops/carbon-hack-24 (github.com)

Process

Being from an organization versed in the Agile way of doing things, we started by identifying the end goal and the tasks we needed, to complete this goal. We then prioritized and distributed these tasks based on each team members’ skill set. Checking in regularly was key, as we are a distributed team. We had standups to facilitate this and to ensure everyone was on the right track. We soon started integrating everyone’s work and to have larger working sessions where everyone collaborated to put the solution together and make sure all the pieces fit.

This iterative process continued as we solved challenges and changed implementations as needed for the various components to interface correctly. We finished the technical part of the hack sooner than expected and had some fun figuring out video editing tools and discovering the youtuber within ourselves.

Inspiration

At Nedbank, our organisation is actively engaged in several strategic initiatives centered around ESG and sustainability. Among these initiatives is the modernisation of numerous legacy applications, often involving containerisation and deployment to Kubernetes. This is the case for most financial service organisations in South Africa.

We aim to embed Green Engineering principles and practices within our squads. As Peter Drucker said, “You can’t improve what you don’t measure". Therefore, we recognize the need for a real time way to measure and inform squads about their performance from an ESG perspective. As we know, the key part of making ESG real is not to just to measure for the sake of measuring but instead to measure what matters.

Challenges

Dynamic Infrastructure:

Navigating the rigidity of the current framework posed a challenge as it lacked a mechanism to dynamically generate a tree of components before pipeline execution. In the dynamic landscape of modern cloud environments, particularly within Kubernetes, the constant creation and deletion of pods and nodes presented a hurdle. We addressed this by devising a solution that involved specifying a flat list and appending infrastructure-specific values to observations, enabling us to identify them accurately upon completion.

(A flat list refers to a list structure where all elements are at the same level, typically arranged sequentially with no hierarchical relationship.)

Node Package Manager (NPM):

We encountered difficulties with NPM which was frustrating as it failed to install packages from GitHub correctly, rendering our newly developed plugins ineffective. To circumvent this issue, we devised a workaround involving traditional git clones and NPM linking, ensuring all necessary files were included in the package upon installation.

Tests:

Understanding the testing framework within the plugin template repository proved challenging, given our team’s limited expertise in JavaScript and TypeScript development. Overcoming this hurdle required diligent debugging and familiarisation with the testing procedures.

Acronyms: The learning curve associated with mastering the numerous acronyms linked to ESG, k8s, Prometheus, and Grafana proved to be steep.

Accomplishments

We’ve tackled challenges that we were faced with head on, and seamlessly integrated various components into our solution. Our solution’s impressive scalability underscores our team’s expertise. By integrating four complex components namely IF, k8s, Prometheus and Grafana, we have demonstrated our team’s problem solving abilities. Despite the numerous challenges we faced, our determination yielded a successful solution, showcasing our ability to tackle intricate problems. Not only did we create two IF plugins, but we also made our Grafana visualisation available as a JSON template, enabling widespread sharing and use.

Through targeted communication efforts we sparked the interest of both internal and external stakeholders. We raised awareness about the Carbon Hack 24, emphasizing the environmental impact of software, and highlighting the potential significance of leveraging Kubernetes for carbon reduction.

Learnings

Our project journey involved learning about the IF and tracking emissions through coding practices. Kubernetes, despite its complexity, proved to be invaluable. A notable challenge was the lack of publicly accessible ESG data for programmatic use. Effective time management and collaboration was achieved with a team of seven members through daily stand-ups and the use of Git, ensuring everyone was synchronized with the project’s progress. We individually expanded our knowledge on plugin frameworks/architecture and developed skills in crafting clear, comprehensible technical documentation.

Some additional learnings were highlighted through data analysis, which underscored the energy efficiency of MacOS, conserving power by shutting down when inactive. We have noted our assumptions and from the data on the dashboard we can clearly see an increase in emissions whenever a more demanding workload was run. While it feels trivial, it goes a long way to validate the functionality of the system and learn to grasp the impact software has on the environment.

What's next?

Our proposed solution will bode well for large enterprises that use Kubernetes. With the rise of sustainability tracking and monitoring, being able to demonstrate and integrate the IF with current enterprise architectures will ensure long term use. At Nedbank, we’re looking towards implementing our solution in all our Kubernetes clusters (where possible) to enable us to track our carbon emissions more accurately and report on them timeously. This will contribute towards long-term adoption of the framework.

Through Kubernetes, we demonstrated how the IF can be used in real time. This functionality was not there, and our solution is a proposal for the GSF to consider adding to the IF.

We believe our solution can contribute to the geographical footprint of the IF. Google Cloud recently launched its first cloud data centre in South Africa (SA), with research estimating that the cloud ecosystem could potentially contribute US$2bn to SA’s GDP and support the creation of 40k jobs by 2030. While it’s still early days, we believe our solution has the potential to be used as a base for the Africa leg of the ecosystem, and we hope to build a local community around green software using our solution as demonstration.

jawache commented 4 months ago

We've implemented a type of plugin which we call and exhaust plugin, documents coming shortly but this is some code that might be of use: https://github.com/Green-Software-Foundation/if/blob/dev/src/lib/exhaust.ts, still needs some adjustment to dyanmically load plugins from npm but should be there shortly. This is the approach I might suggest for publishing to prometheus after the computation phase has completed.