Green-Software-Foundation / hack

Carbon Hack 24 - The annual hackathon from the Green Software Foundation
https://grnsft.org/hack/github
14 stars 1 forks source link

Azure importer Extension Plugin #64

Open srini1978 opened 4 months ago

srini1978 commented 4 months ago

Type of project

Best Plugin

Overview

The Azure Importer is a plugin that aims to provide a capability to calculate the Software Carbon Intensity (SCI) of workloads deployed on Azure services, using the Impact Framework. The project plans to extend the existing proof of concept (POC) of the Impact Framework for Azure VMs, which can be found here: https://if.greensoftware.foundation/tutorials/how-to-use-azure-vm

Based on our analysis we have observed that the workloads running on any compute, storage or network platforms draw energy from the underlying host. This energy can be measured through metrics like CPU utilization, memory utilization etc. By measuring these metrices over fixed time period and passing them through the IF computational pipeline we will be able to calculate the SCI score for these workloads on Azure.

The plugin will use the Azure Monitor API to collect observations from Azure resources and feed them into the Impact Framework computation pipeline. The project will also support batch API and multiple metrics for the SCI calculation.

The importer will take the subscriptionID and resource group name as input and run through all the azure services deployed in that specific resource group. For the hackathon, we aim to consider Azure Virtual machines, Virtual machine scale set, Azure SQL and Azure Kubernetes Services (as a stretch goal).

From an impact framework perspective, this is a plugin but the plugin itself can be considered as a framework for extending it to more Azure services in future.

Questions to be answered

No response

Have you got a project team yet?

Yes and we aren't recruiting

Project team

@srini1978 @yelghali @vineeth123-png

Terms of Participation


Project Submission

Summary

In the Impact framework repository , today we have Azure importer which is available as an unofficial plugin that provides the capability to calculate SCI score for a single Virtual Machine . In this hack we have extended the capability of Azure importer to be able to calculate SCI emission scores for all the resource types present in an Azure subscription for e.g Azure SQL, Virtual machine scale sets, Azure functions etc.

Problem

In most applications developed for Azure , the resource group and/or the subscription serve as the bounded context . i.e. all the components of the application are deployed entirely in a single resource group or spread across multiple resource groups within the same subscription.

Hence if we want to automate the calculation of software emissions for an entire application that is built on Azure with the help of the Impact Framework, we should have the capability to provide just the subscription ID and the pipeline should calculate the end result. We cannot achieve this with the code that is there in the unofficial-plugins repository today. This is because the version of Azure SDK of Typescript that is in release state (7.0.0) is using the older version of the Azure monitor API (2018 version) that does not support retrieving observations for multiple resources in a subscription.

This is a known bug in the Azure SDK for Typescript and is captured here -GitHub issue for this -https://github.com/Azure/azure-sdk-for-js/issues/29013

We have addressed this issue as part of the hack and worked with the Azure core SDK team to upgrade the API version being used and a new version of Azure SDK for typescript has been released (8.0.0 beta 5) https://www.npmjs.com/package/@azure/arm-monitor/v/8.0.0-beta.5

Application

Azure importer is a plugin on top of Impact Framework that will be pulling telemetry or observations from Azure services and passing it through the IF computation pipeline for calculation of SCI score. The plugin will use the Azure Monitor API to collect observations from Azure resources and feed them into the Impact Framework computation pipeline.

The following diagram illustrates the high-level architecture of the Azure importer plugin extension.

image

High level Design

The Azure importer plugin consists of the following software components:

Prize category

Best Plugin

Judging Criteria

1) Since this is an existing plugin that we are extending, we can call this component "Azure Importer Extension". The singular aim of building this extension is to enable scale for calculation of software emissions from Azure cloud. By running the importer by giving just subscriptionID as input, a multitude of scenarios can be enabled for developers, architects and devops professionals.

2) This will open the door for measuring emissions for all types of software on azure cloud - batch jobs, serverless functions, managed services, Kubernetes nodes, storage queues and Synapse extensions, Front end apps, middle tier applications and distributed databases can all be included in the Azure importer pipeline.

3) Large enterprise customers who run all their systems on Azure cloud can now benefit by just providing a single parameter and the end to end emissions can now be calculated.

4) The extension will also give an impetus to other hyper scalars to do similar implementations. GCP and AWS can use this as a reference implementation and do similar implementations for their cloud environments.

Video

A link to your video submission on YouTube

Artefacts

Link to the code or content

Usage

https://github.com/srini1978/if-unofficial-plugins-AzureHackers/blob/main/src/lib/azure-importer/README.md

Process

Manifest File design

The impact framework is a pipeline of model plugins that are chained to get an output. In this section we will learn about the plugins that were leveraged, the input and the output parameters that were used for each of the plugins.

The processing done in each plugin is also detailed out.

Sl no

Plugin used

Inputs

Outputs

1

Azure importer- Extended

SubscriptionID

Metric namespace

Cpu/utilization

Final output is all the list of metrices for all the resource types present in the subscription.

Process:

  • A call is made to management API to retrieve all resource types as part of the subscription. Examples of resource types (metric namespaces)
  • microsoft.compute/virtualmachines, microsoft.storage/storageaccounts, Microsoft.Sql/managedInstances etc
  • For these resource types a call is made to Azure Monitor REST API which takes subscription ID along with region name, metric namespace and returns all metrics associated with all resources of the specific resource type

2

Cloud Metadata

  • Cloud/Vendor
  • Cloud/instance type

Process:

Cloud metadata is a standard IF-plugin that is used. IT gives the thermal design power and physical processor that was used in the underlying Azure instance based on the cloud/vendor input and cloud/instance type value.

  • Physical-processor
  • v-cpus-allocated
  • vcpus-total
  • cpu/thermal-design-power

3

Teads Curve

  • v-cpus-allocated from Cloud Metadata
  • vcpus-total from Cloud Metadata
  • CPU utilization from Azure importer
  • cpu/thermal-design-power from Cloud Metadata

Teads model gives Energy CPU in kwH if you give TDP value of processor.

For getting TDP value you need to get the name of the underlying processor which is given by cloud instance metadata

If we cannot make changes to cloud instance metadata we will use default value

cpu/thermal-design-power: 100

If vcpus-allocated and vcpus-total are available, these data will be used to scale the CPU energy usage. If they are not present, we assume the entire processor is being used. For example, if only 1 out of 64 available vCPUS are allocated, we scale the processor TDP by 1/64.

4

CCF

  • Duration from input params
  • CPU/utilization from Azure importer
  • Cloud/vendor from Azure importer
  • Cloud/instance from Azure importer

CCF is a community plugin. It is a self-contained dataset. Given the input parameters we can use CCF to get the embodied emissions for the cloud/vendor and cloud/instance. Energy is also given but we can ignore it as we are getting energy value from TEADS curve model

Outputs: carbon-embodied

5

SCI-M

  • V-cpus-allocated from cloud metadata
  • V-cpus-total from cloud metadata
  • Carbon-embodied from CCF
  • Device-expected lifespan can be provided as a default value (3 years)

Outputs: adjusted carbon-embodied is given

6

SCI-E

CPU/energy in KwH from Teads Curve

Energy in KwH

  • SCI-E model is then used to convert energy-cpu to energy.
  • Energy is provided as output

7

Watt time

Region/location where the workload is run is given as input to WattTime.

Grid/carbon-intensity is returned as output

8

SCI-O

  • Energy in kwh from SCI-E
  • Grid/carbon-intensity

Carbon-operational is returned as output.

SCI-O should always be preceded by sci-e

Inspiration

Tell us what inspired you to develop the solution Max 150 words

Challenges

Share the challenges you ran into

Accomplishments

  1. With this hack we have now enabled anyone and everyone who wants to measure emissions from their Azure solutions. With the existing version of Azure importer that was there in the repo, there was a bug in the Azure SDK for typescript and how metrics could be retrieved from Azure monitor API.

To give some more details, the Azure monitor API can either give metrics output at just a resource level or at an entire subscription level. However only the latest version of the Azure monitor API allows querying at an entire subscription level. But the typescript SDK's code was never updated to the latest version and hence it was pointing to the 2018 version of the API instead of the 2023 version.

https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/rest-api-walkthrough?tabs=portal

Due to this the API failed our calls. We worked with the Azure typescript SDK team to point to the right version of API. Bug fixed -https://github.com/Azure/azure-sdk-for-js/issues/29013

  1. I also provided mentorship to a team in Microsoft who works on Data pipelines. They got inspired by what I was doing and submitted a hack themselves -https://github.com/Green-Software-Foundation/hack/issues/87

Learnings

  1. Learnt how to chain the different plugins as part of the Impact Framework computational pipeline.
  2. Typescript development and node package usage were new learnings.
  3. Learnt how to design a manifest file.

What's next?

How will your solution contribute long term to the Impact Framework eco-system Max 200 words

tfsjohan commented 4 months ago

There are now a "better" way to get this data with the Azure Carbon Optimization API. Recently released.

srini1978 commented 4 months ago

Azure carbon optimization API provide emission scores at a Azure service level. Methodology is usage based and focuses on calculating a emissions/usage factor . As you see here "https://learn.microsoft.com/en-us/industry/sustainability/api-calculation-method" emissions are allocated based on their relative Azure usage in a given datacenter region. An algorithm calculates a usage factor that provides emissions per unit of customer usage in a specific Azure data center region, then emissions are directly calculated based on this factor. This process of attribution is shown graphically in the scope 3 emissions allocation methodology image earlier in this article

The azure importer however focuses on calculating emissions for workloads deployed on your azure services. for e.g it is for the payment processing job running on Azure web job or Azure function rather than the entire Azure webjob itself. So it is the next level of granularity that can be actionable to the developer. Also the methodology used is not usage based but the actual observations which are the azure monitor metrics like CPU %, Memory % etc