As an IF user, I want to be able to create and use custom CPU power curves in IF so that I could calculate carbon emissions of my system based on it's specific power consumption model
Rationale
Based on our work on the Intel model (aka IEE = "Intel Energy Estimator") we know that many clients (private, OEMs and others) use customized systems where the power consumption characteristics (aka power curves) may differ from the generic ones provided by manufacturers (if any). The idea here is to provide a model where users will only need to plug in their custom power curve (as a data file) and have the model carry the burden of energy calculations, among others, at runtime.
Such a model would be backed by a script that will generate a power curve on given system.
Impact
Successfully implementing this idea will:
Significantly simplify getting started with IF, for those wanting to have a pipeline that accurately reflects their specific system modifications and customizations. Acknowledging the importance of CPU modeling in the manifest file, we can see how this will improve onboarding of such users, as they will be able to create their own power curves and then only provide the resulting data to the model, without writing any code.
Increase transparency under a unified standard: once all CPU-modeling users use the same Generic CPU Power Curve generation script and model, we know that while the data naturally differs between each, the process of creating the power curves and calculations done based on those curves are the same across all users. Even if users can't share their data (power curves), we at least have transparency over their computations.
The current industry standard for generating power loadlines is to utilize SPECPower or SERT. However, these tools are not universally accessible and come with their own set of limitations. For instance, they require direct access to hardware for measurements and operate within highly structured benchmarking environments that may not fully replicate real-world scenarios. One significant limitation we aim to address is the reliance of SPECPower and SERT on a set of standard benchmarks, which may not cover all software scenarios adequately.
Opensource solutions like Teads and CodeCarbon also utilize loadlines in their own calculations, showcasing the growing recognition of loadline-based methodologies in energy consumption estimation. However, these solutions may have their own limitations or may not provide the flexibility required for certain user scenarios.
Also, the plugin we're submitting acts as a simple and generic way to use power curves (of any device, not just CPUs) to the impact framework.
Application
To address these limitations and provide a more accessible solution, we propose the development of a framework that enables users to generate their own power loadlines for personal use. Our goal is to create an extensible and flexible framework that empowers users to generate power loadlines tailored to their specific systems. While our framework may not achieve the same level of accuracy as SPECPower or SERT, it provides developers with energy estimates that are more likely to reflect their unique situations.
Having achieved power loadlines (aka power curves) users can then leverage the simplicity and effectiveness of the IF to measure energy consumption of any SW workload.
Prize category:
Best Plugin
Judging criteria
Impact on the broader sustainability movement
CPUs carry a siginificant part of software workload. The ability to accurately calculate their energy consumption is crucial in the growth of software energy measurability.
What things need to happen for that impact to occur?
In our vision, vendor-generated power curves are important for transparency and provide very good accuracy. But often times end user systems deviate from the generic, "clean" power curve generated at the vendor's lab due to customizations, aging effects and custom workloads that are not well represented by the generic curves. For this impact to reach its max potential, we vision users using the impact framework and feeding it with their own custom-generated power curves to get a better, more accurate energy measurement.
Opportunity
Given the tool to generate custom power curves, and the IF plugin to utilize these curves, users can now measure energy for any SW platform, regardless of vendor provided data.
Modular
Our IF plugin complies with the IF standard and interfaces.
Our power-curve generator tool can execute any benchmark, be it a standard or a custom one.
Please refer to the respective README files of the artefacts above
Process
We started by studying the power curve generation process of SPECPower and SERT. This involves a calibration phase to determine the workload necessary to fully utilize all available CPU resources, marked as 100% utilization. They then measure power consumption while executing this workload. Subsequently, they iteratively adjust the workload to achieve lower utilization levels, measuring power consumption at each step. For instance, to determine the workload at 90% utilization, they calculate 90% of the workload at 100% utilization and measure the corresponding power. Inspired by this methodology, we structured our benchmark implementation accordingly. Our subsequent task was to develop a flexible framework for future customization. This framework aims to accommodate diverse power polling methods and benchmarks, ensuring extensibility. Our code structure allows users to create their own benchmarks, providing flexibility for granularity adjustments and hardware optimization exploitation.
IF plugin development was done according to the IF plugin interface and development guidelines.
Inspiration
Our quest to gauge the carbon footprint and energy consumption of our software led us to explore open-source solutions. While impressed by Teads and Boavizta's efforts, closer scrutiny revealed their reliance on power curves for estimation. However, the methods used to generate these curves raised doubts. While power curves and TDP are crucial for energy calculations, their assumptions often don't hold in real-world scenarios. Typically crafted in controlled environments, these curves may not align with the diverse workloads we sought to measure. This disconnect between idealized settings and practical application left us with unresolved questions.
Challenges
Our team is globally dispersed, leading to common communication challenges. Our proficiency in Python also varied among team members, with some seizing the opportunity to enhance their skills in the language. Furthermore, while none of us are power experts, we were fortunate to have a team member with substantial knowledge in this domain. Bringing the team up to speed on the issue and devising an approach required considerable trial and error.
Moreover, having 2 components in the same project led to significant time and effort spent in integration and troubleshooting.
Accomplishments
We are incredibly proud of achieving our intended functionality. Throughout this journey, we've gained valuable insights into the realm of power management and significantly expanded our proficiency in Python. While acknowledging that there's more to accomplish, we celebrate the solid foundation we've laid down with our framework.
Learnings
The key lesson we gleaned from our hackathon experience is the complexity of power and energy measurement. Despite the well-defined nature of the field and the familiarity with its challenges, addressing these issues meaningfully proved challenging. Whether it's the inability to isolate processes within the machine or ensuring measurement reliability without physical devices, we encountered hurdles. In response, we recognized the necessity of documenting our assumptions and devising strategies to address potential sources of error in future endeavors.
What's next?
Our next steps involve comprehensive testing across various environments, encompassing containerized environments and diverse bare-metal setups, to ensure the robustness and adaptability of our framework. We aim to delve deeper into power measurement methodologies for applications running on single or multiple sockets, considering potential code modifications to accurately track their performance. Additionally, we will rigorously test the functionality for hyperthreaded applications, anticipating adjustments to accommodate their nuances effectively. Enhancing support for CLI and extending compatibility to AMD processors are also pivotal tasks on our agenda. Moreover, we plan to establish a robust suite of test cases and implement continuous integration practices to fortify our framework's reliability. Concurrently, we are committed to broadening our understanding of power dynamics, seeking insights into the underlying reasons behind observed results. We also aspire to enhance the flexibility of our scripts, ensuring they dynamically adapt to the benchmark's information requirements, optimizing their utility and versatility. In general, we need to increase the robustness of the framework.
Type of project
Building a plug-in for Impact Framework
Overview
User Story
As an IF user, I want to be able to create and use custom CPU power curves in IF so that I could calculate carbon emissions of my system based on it's specific power consumption model
Rationale
Based on our work on the Intel model (aka IEE = "Intel Energy Estimator") we know that many clients (private, OEMs and others) use customized systems where the power consumption characteristics (aka power curves) may differ from the generic ones provided by manufacturers (if any). The idea here is to provide a model where users will only need to plug in their custom power curve (as a data file) and have the model carry the burden of energy calculations, among others, at runtime. Such a model would be backed by a script that will generate a power curve on given system.
Impact
Successfully implementing this idea will: Significantly simplify getting started with IF, for those wanting to have a pipeline that accurately reflects their specific system modifications and customizations. Acknowledging the importance of CPU modeling in the manifest file, we can see how this will improve onboarding of such users, as they will be able to create their own power curves and then only provide the resulting data to the model, without writing any code. Increase transparency under a unified standard: once all CPU-modeling users use the same Generic CPU Power Curve generation script and model, we know that while the data naturally differs between each, the process of creating the power curves and calculations done based on those curves are the same across all users. Even if users can't share their data (power curves), we at least have transparency over their computations.
Discussion
This project idea originated from this discussion: https://github.com/Green-Software-Foundation/hack/discussions/37
Questions to be answered
No response
Have you got a project team yet?
Yes - "The GreenChips" :)
Project team
@ajagann - Akshaya Jagannadharao @greeliyahu - Eli Greenberg @dgolive - Danilo Oliveira @pazbardanl - Paz Barda @OrhenG - Orhen Oren Greenberg
Terms of Participation
Submission Content:
Summary
Our project aims at measuring energy consumption of SW execution on a CPU, or any other device / component that could be characterized by a power curve / power load line. An innovative part of the project is also tackling the challenge of generating such power curves for any CPU on any given system. Generc CPU IF Plugin: https://github.com/pazbardanl/if-plugins/tree/generic-cpu Power curve generator: https://github.com/ajagann/powercurve-generator/tree/feature/update_polling_result_integ
Problems
The current industry standard for generating power loadlines is to utilize SPECPower or SERT. However, these tools are not universally accessible and come with their own set of limitations. For instance, they require direct access to hardware for measurements and operate within highly structured benchmarking environments that may not fully replicate real-world scenarios. One significant limitation we aim to address is the reliance of SPECPower and SERT on a set of standard benchmarks, which may not cover all software scenarios adequately. Opensource solutions like Teads and CodeCarbon also utilize loadlines in their own calculations, showcasing the growing recognition of loadline-based methodologies in energy consumption estimation. However, these solutions may have their own limitations or may not provide the flexibility required for certain user scenarios. Also, the plugin we're submitting acts as a simple and generic way to use power curves (of any device, not just CPUs) to the impact framework.
Application
To address these limitations and provide a more accessible solution, we propose the development of a framework that enables users to generate their own power loadlines for personal use. Our goal is to create an extensible and flexible framework that empowers users to generate power loadlines tailored to their specific systems. While our framework may not achieve the same level of accuracy as SPECPower or SERT, it provides developers with energy estimates that are more likely to reflect their unique situations. Having achieved power loadlines (aka power curves) users can then leverage the simplicity and effectiveness of the IF to measure energy consumption of any SW workload.
Prize category:
Best Plugin
Judging criteria
Impact on the broader sustainability movement
CPUs carry a siginificant part of software workload. The ability to accurately calculate their energy consumption is crucial in the growth of software energy measurability.
What things need to happen for that impact to occur?
In our vision, vendor-generated power curves are important for transparency and provide very good accuracy. But often times end user systems deviate from the generic, "clean" power curve generated at the vendor's lab due to customizations, aging effects and custom workloads that are not well represented by the generic curves. For this impact to reach its max potential, we vision users using the impact framework and feeding it with their own custom-generated power curves to get a better, more accurate energy measurement.
Opportunity
Given the tool to generate custom power curves, and the IF plugin to utilize these curves, users can now measure energy for any SW platform, regardless of vendor provided data.
Modular
Our IF plugin complies with the IF standard and interfaces. Our power-curve generator tool can execute any benchmark, be it a standard or a custom one.
Video
https://www.youtube.com/watch?v=BWZpbMFQADA
Artefacts:
Generc CPU IF Plugin: https://github.com/pazbardanl/if-plugins/tree/generic-cpu Power curve generator: https://github.com/ajagann/powercurve-generator/tree/feature/update_polling_result_integ
Usage
Please refer to the respective README files of the artefacts above
Process
We started by studying the power curve generation process of SPECPower and SERT. This involves a calibration phase to determine the workload necessary to fully utilize all available CPU resources, marked as 100% utilization. They then measure power consumption while executing this workload. Subsequently, they iteratively adjust the workload to achieve lower utilization levels, measuring power consumption at each step. For instance, to determine the workload at 90% utilization, they calculate 90% of the workload at 100% utilization and measure the corresponding power. Inspired by this methodology, we structured our benchmark implementation accordingly. Our subsequent task was to develop a flexible framework for future customization. This framework aims to accommodate diverse power polling methods and benchmarks, ensuring extensibility. Our code structure allows users to create their own benchmarks, providing flexibility for granularity adjustments and hardware optimization exploitation. IF plugin development was done according to the IF plugin interface and development guidelines.
Inspiration
Our quest to gauge the carbon footprint and energy consumption of our software led us to explore open-source solutions. While impressed by Teads and Boavizta's efforts, closer scrutiny revealed their reliance on power curves for estimation. However, the methods used to generate these curves raised doubts. While power curves and TDP are crucial for energy calculations, their assumptions often don't hold in real-world scenarios. Typically crafted in controlled environments, these curves may not align with the diverse workloads we sought to measure. This disconnect between idealized settings and practical application left us with unresolved questions.
Challenges
Our team is globally dispersed, leading to common communication challenges. Our proficiency in Python also varied among team members, with some seizing the opportunity to enhance their skills in the language. Furthermore, while none of us are power experts, we were fortunate to have a team member with substantial knowledge in this domain. Bringing the team up to speed on the issue and devising an approach required considerable trial and error. Moreover, having 2 components in the same project led to significant time and effort spent in integration and troubleshooting.
Accomplishments
We are incredibly proud of achieving our intended functionality. Throughout this journey, we've gained valuable insights into the realm of power management and significantly expanded our proficiency in Python. While acknowledging that there's more to accomplish, we celebrate the solid foundation we've laid down with our framework.
Learnings
The key lesson we gleaned from our hackathon experience is the complexity of power and energy measurement. Despite the well-defined nature of the field and the familiarity with its challenges, addressing these issues meaningfully proved challenging. Whether it's the inability to isolate processes within the machine or ensuring measurement reliability without physical devices, we encountered hurdles. In response, we recognized the necessity of documenting our assumptions and devising strategies to address potential sources of error in future endeavors.
What's next?
Our next steps involve comprehensive testing across various environments, encompassing containerized environments and diverse bare-metal setups, to ensure the robustness and adaptability of our framework. We aim to delve deeper into power measurement methodologies for applications running on single or multiple sockets, considering potential code modifications to accurately track their performance. Additionally, we will rigorously test the functionality for hyperthreaded applications, anticipating adjustments to accommodate their nuances effectively. Enhancing support for CLI and extending compatibility to AMD processors are also pivotal tasks on our agenda. Moreover, we plan to establish a robust suite of test cases and implement continuous integration practices to fortify our framework's reliability. Concurrently, we are committed to broadening our understanding of power dynamics, seeking insights into the underlying reasons behind observed results. We also aspire to enhance the flexibility of our scripts, ensuring they dynamically adapt to the benchmark's information requirements, optimizing their utility and versatility. In general, we need to increase the robustness of the framework.