Closed will-iamalpine closed 3 years ago
Was there any discussion on this in the previous weekly meeting @buchananwp ?
@atg-abhishek thanks for checking. We presented this today in the call. Next steps:
This is great @buchananwp - let me know which of the steps I can help with :)
move to project template and validate by WG
Prototype here: http://azure-uw-cli-2021.azurewebsites.net/docs_page
Momentum
We have several internal MSFT pilots to use this tool, and intend this as an expression of the SCI spec we're developing.
Overview
To enable Microsoft and GSF stakeholders to make smart decisions about their environmental impact and carbon footprint, we have created the Carbon Aware API to minimize the carbon emissions of computational workflows. A few key features of the API are:
Details and Methodology
Marginal Carbon Emissions: This grid-responsive metric with finer granularity than average emissions, allows for seasonality/diurnal trends captured in demand shifting (source).
Retrospective Analysis: Time series evaluation to assess the carbon emissions for a given energy profile. Also provides counterfactual analysis to expose the potential emissions of if the run had been shifted.
Geographic: Finds the region with the current lowest average carbon intensity for an immediate run of a specified duration. Can filters available regions by available SKU and migration laws for workspaces with protected data.
Regional Carbon Intensity: Provides the carbon intensity for each data center supported by a WattTime- tracked balancing authority. The possible scopes are historic intensities (time series for prior 24-hours, week, and month), real-time marginal intensity, and forecast (mean intensity for upcoming user- defined window).
Funding/Support Needed: $100k
Scheduling and Logging: Link to Global Job Dispatcher for carbon-aware chron-scheduler for ML workspaces. Need to create a logging system to track recommendation uptake and performance.
Refactoring: Currently built within a Flask framework. Need to refactor endpoints for improved readability, latency reduction, and robustness. Need to add testing and CI/CD for improved engineering standards. In progress, with completion in September.
OSS and Build to Scale: Refactored script needs to be converted from Python to a lower-level language in order to scale. To release the Carbon Aware API as a viable open-source scheduling tool with support for multiple input sources and infrastructure agnosticism, additional resources are needed to structure in a language that is compatible with the Azure stack (ex: C# or C++).
OKR’s
Objective: As an expressiona tool implementing of the GSF’s software carbon intensity (SCI) specification, we seek to build community engagement and awareness through an OSS carbon-aware API that can be extended to other cloud providers & data sources.
Key Result: Hosted API that is capable of handling client requests at scale, that enables impactful carbon reductions in line with the methodology of the SCI.
Goal with getting to OSS
The carbon-aware API becomes the standardized way to enable (change behavior of) developers to time- and region-shift their computing loads to generate carbon emissions savings. This extensible toolkit is multi-cloud compatible, and enables cron-based scheduling of workloads.
This standardization can be through an approach to implement the Software Carbon Intensity (SCI) from the Green Software Foundation (GSF) as a starting point and perhaps being flexible enough to support other standards as well in the future. Though hopefully there won’t be too many!
Some requirements that we need to meet to get to a solid OSS
Based on the codebase that we have now and the discussions with Taylor, some of the key things that I think we’ll need to address are as follows:
Engineering Standards
Testing (unit, system, and integration) Flexibility of code to be adapted by the OSS community
Modularity of code to tackle:
Tracking generated impact
Carbon-counterfactual: This is something that I had discussed with Taylor in terms of us building in instrumentation and gathering telemetry as to what action the user took so that we know whether the suggested action was something that they found useful / acted on or not
This will require additions in the UI as well where we have:
A running counter showing something like: “27 other users shifted their computing loads in the past hour and saved 512 kg CO2eq, that’s equal to 3 fewer cars going from X to Y”
This is going to be crucial if we want to trigger behavior change. Why this is on this list of getting to OSS is that it will help demonstrate more strongly the usefulness of this tool in terms of hard numbers and help us publish aggregate numbers which can drive more interest in the project and drive its uptake as well by developer communities across various regions and platform choices.
This will also help anybody who is trying to make a business use case for funding this to have more concrete data in terms of actually triggering change and how many people are using it.
If we go down this path, we might need to have a centrally hosted analysis component that gathers all this telemetry from the various users and sends back aggregated stats to those using the tool to provide the notification as mentioned above.
Moving over the project roadmaps to “GitHub Projects”
Contribution guidelines
Licenses
I think v2 of the OSS can migrate to a different language for scalability, it would be good to take this codebase in Python that we have now as far as we can to glean other insights in terms of what the target audience might desire from it first and then take those insights to build something in C#/C++ as indicated in the other document.