Green-Software-Foundation / hack

Carbon Hack 24 - The annual hackathon from the Green Software Foundation
https://grnsft.org/hack/github
15 stars 2 forks source link

Create a DataDog plugin to collect resource usage metrics #99

Open moin-oss opened 8 months ago

moin-oss commented 8 months ago

Prize category

Best Plugin

Overview

DataDog is a monitoring service, particularly useful for cloud applications, that collects various metrics related to running and profiling applications. In particular, a user can collect resource usage metrics, such as processor, memory, and storage usage that can be used to create an energy usage profile of an application. Let us create a plugin which can capture these metrics from DataDog and use them in the Impact Framework to generate an emissions profile.

Questions to be answered

No response

Have you got a project team yet?

Yes and we are still open to extras members

Project team

@bvickers7, @chargao, @asibille-indeed, @GoethelTori, and @atilanog-indeed.

Terms of Participation

jmcook1186 commented 8 months ago

Great idea! This would be a very useful importer plugin!

russelltrow commented 7 months ago

Hi @moin-oss please don't forget to register your project: https://hack.greensoftware.foundation/register/

This provides you direct access to the Impact Framework team for your questions and also benefits from our community partners (Microsoft & Electricity Maps).

You must register your project before you can submit your solution for judging.

moin-oss commented 7 months ago

Current repo for this work: https://github.com/moin-oss/datadog-importer

bvickers7 commented 7 months ago

@moin-oss, who originally made this issue, is on vacation at the moment. As such we are unable to edit the issue and are leaving the submission details as a comment.

Project Submission

Summary (100 words)

The Datadog Importer plugin pulls metrics from Datadog and formats them as inputs to later stages of Impact Framework calculations.

Problem (200 words)

When running the Impact Framework programmatically, manual input of usage metrics becomes a bottleneck. At our company, we are interested in providing application level emissions reports. To do this, we cannot rely on manually created manifest files. Instead, we'd like to leverage an existing source of runtime metrics (Datadog), to provide the raw data to early stages of Impact Framework calculations.

Application (200 words)

This plugin is designed to be used at the start of an Impact Framework calculation. Instead of manually compiling the list of inputs to the calculation, users can configure the plugin to pull observed usage metrics from Datadog, an observability solution can be used to track application metrics. For this MVP, we limited the implementation to metrics for CPU utilization percentage. The plugin is configured with details of how the metrics are set up for the organization. Inputs identify a node and time window for which metrics are to be gathered. The output will represent the value of the metric spaced out over the observation window.

Prize category

Best Plugin

Judging Criteria (200 words per section)

Overall Impact The overall impact of this plugin is that it would make it significantly easier to measure the environmental impact of applications that are already configured to emit performance metrics to Datadog. This could make the adoption and usage of the Impact Framework more widespread if the framework is more easily integrated into existing systems.

Opportunity Datadog is a very popular observability platform used by many companies as their internal observability system. By creating an easy-to-use plugin for matching Datadog metrics to inputs for the Impact Framework, there is a large opportunity to look at a variety of application types that may not currently have a dedicated plugin.

Modular As initially implemented, the plugin is limited to CPU metrics, however it can be configured for different metric names and tags. Additionally, it is not biased towards on-premise or any single cloud provider. By using metrics that have already been standardized within Datadog, the plugin becomes more flexible.

Video

https://youtu.be/k2SHStURdOw

Artefacts

https://github.com/moin-oss/datadog-importer

Usage

https://github.com/moin-oss/datadog-importer/blob/main/README.md

Process (150 words)

We relied heavily on the prior work of the azure-importer plugin for deciding how our configuration, inputs, and outputs would be structured. From there, we referenced Datadog API documentation to query and parse the metrics from within our plugin.

Inspiration (150 words)

Our team consists of employees of a company that uses Datadog as an observability platform. As we have looked into using IF to measure our applications, we've relied on manually copying inputs from Datadog to manifest files. To speed up this process and to open the door to more generic reporting, we saw value in a plugin that could programmatically pull metrics from Datadog.

Challenges (150 words)

We struggled some between delivering a working MVP and designing a more generic solution. We were careful to limit the extent to which details of our runtime and monitoring solutions are baked into the plugin. Longer term, we'd like to make it more generic. Additionally, it was unclear to us if the framework supported expanding a single input into outputs spanning multiple child nodes.

Accomplishments (150 words)

We successfully implemented the core functionality of the plugin. We have greatly reduced manual toil involved with performing IF calculations at our company and taken a step towards dynamically generating application level emissions reports.

Learnings (150 words)

Usage metrics of modern applications can be challenging to quantify due to the complex interactions between containerization technology like Docker, deployment management systems like Kubernetes (k8s), cloud providers like AWS, even individual deployment customizations for individual applications. Getting accurate metrics relies on a careful balance between making good assumptions where possible, and digging deeply into specifics when necessary.

What's next? (200 words)

There are 3 main feature areas in which we'd like to improve the plugin to make it more useful generically:

  1. Allow users to configure generic output metrics, not just CPU utilization percentage.
  2. Allow users to control how the time window from the input is split across the output.
  3. Allow users to specify generic facets on which the outputs are grouped, and output multiple child nodes where each child is a unique combination of values for the facets. In simpler terms, output multiple children based on a set tags.