Green-Software-Foundation / hack

Carbon Hack 24 - The annual hackathon from the Green Software Foundation
https://grnsft.org/hack/github
14 stars 1 forks source link

WaterWise #106

Open maddendeavor opened 2 months ago

maddendeavor commented 2 months ago

Prize category

Beyond Carbon

Overview

Our goal with this project is to look at the impacts of cloud computing (datacenter) impact on water. We plan to build a WUE and/or human impact estimation based on the suggestion by @jawache here: https://github.com/Green-Software-Foundation/hack/discussions/82. As a stretch goal for the project we would like to use the plugin to compare cloud deployments in various regions and compare to the decisions you may make if only considering carbon impact or other factors.

Questions to be answered

TBD

Have you got a project team yet?

Yes and we aren't recruiting

Project team

@ridhee1gupta

Terms of Participation

maddendeavor commented 2 months ago

@jawache @russelltrow I have a question about the cloud-metadata plugin gsf-data.csv. I was noticing some discrepancies between a couple of the azure data center locations listed in the spreadsheet and what I am finding on the azure website (location of westus2 and westus3). I was curious what the source of the spreadsheet is and how it gets updated? If we find discrepancies, should we try to fix them? Our initial thought for this project was to include the WUE data for specific data centers or regions to this spreadsheet, so could make the updates while we are working on them. Thanks for the guidance!

jawache commented 2 months ago

Hi @maddendeavor that is the perfect place to put this WUE data, eventually we'd like to have PUE and PUE returned by this plugin.

We're manually maintaining this data ourselves so there very well could be errors in there that need fixing (there is no API etc... to get this data, we do what you are doing which is reading content online and making assumptions).

It would be better if you fork the repo and apply any changes to your fork, post hackathon we'd appreciate a PR to upstream your changes back to the main plugins repo.

Adding in @jmcook1186 and @manushak also for their thoughts.

jmcook1186 commented 2 months ago

Hi - yes, that dataset should be considered preliminary and it is scheduled for a proper audit (there's a ticket on our board to do this next sprint). If you find inconsistencies we'd love to know about them as it will help us with the maintenance. PR on this would be welcome, but please make sure we can see where you got your numbers from to help us evaluate. @maddendeavor @jawache

maddendeavor commented 2 months ago

Great! That was totally my plan and already started the fork. I was trying to think yesterday on a way to have traceability on the numbers in the spreadsheet in general (perhaps a separate column for each data column containing the links?), but would add more complexity when updating. I'll make sure all our sources are listed in the README at minimum.

ridhee1gupta commented 1 month ago

Summary

We created a modification to the cloud-metadata plugin that gives the user the amount of water a data centre they choose will use for the piece of software they want to host in that location. We used WUE data for Microsoft Azure (Azure) data centres where this information was available and the fleet-wide average for Amazon Web Services (AWS). Additionally, we used the YAML file that is outputted on successful completion to create some preliminary plots with an additional script that gives the user an idea of which data centre has higher carbon output and water usage.

Problems

This enhancement to the cloud-metadata plugin is made to address the problem of not knowing how much water a piece of software would use while being stored at a data centre. This information depends on the amount of CPU energy that the data centre uses for the specific software. Through cloud-metadata, we know the amount of carbon emitted due to the use of a specific data centre, but data centres use other natural resources to stay running. One of the biggest resources that data centres need to stay running is water. Especially in areas that are water-starved, this creates a big problem of trade-off between the economic gains that setting up data centres bring to a region and the amount of water that gets diverted for them to stay functional. To create green software that considers carbon emissions as well as the impact of software on other natural resources, a starting point is to know the amount of water being used to keep software running.

Application

Using the water usage effectiveness (WUE) of a data centre, measured in L/kWh, and multiplying it by the energy used by the CPU, measured in kWh, gives us the amount of water that will be used by that data centre to store and run the piece of software we want to store in the “cloud”, in litres (L). This is outputted with the rest of the information that the cloud-metadata plugin outputs, like the CPU energy used by the software, the carbon emitted for this software, and the physical processor specifications. We added an additional line there to output the amount of water used by the software for a specific data centre. Our demo GitHub repository also contains a python script that gives the user preliminary plots showing how their selected data centres compare with each other on water and carbon usage using the outputted YAML file. It also shows which data centres have higher CPU energy usage to understand where the values in the former plots are coming from. This allows users to visually compare data centres and choose the best one for their use case.

Prize Category

Beyond Carbon

Judging Criteria

This enhancement offers users a glimpse into how their program would be using water in the data centre they decide to host it in. In addition to other considerations, this lets them consider the impact on carbon emissions AND water usage in data centres, so they can choose a location for their software that is sustainable and has the least impact on our natural resources. This can lead to better decision making from engineers and programmers that are customers of cloud providers and may also result in better transparency from cloud providers on the numbers behind their natural resource usage. The project is a start towards understanding water usage in data centres. This enhancement uses data reported by the three biggest cloud providers, Azure, AWS, and Google Cloud Platform (GCP). GCP isn’t used in our enhancement since cloud-metadata only gives information on Azure and AWS. All the research is backed by links pointing to exactly where we got the numbers for each data centre’s WUE from. All data was taken from each company’s sustainability reports or websites.

Video

https://youtu.be/cv8gDPvJYzE

Artefacts

https://github.com/maddendeavor/if-plugins/tree/water-wise-cloudmeta-update

Usage

https://github.com/maddendeavor/water-wise-demo

Process

We performed internet searches to find data related to WUE for specific datacenters. We then familiarized ourselves with the IF framework, including following some of the trainings offered by Green Software Foundation. We experimented with creating a standalone plugin as well as using the shell wrapper around a python script, but ultimately decided that the simplest and most straight forward method for integrating the WUE data was to add it to the cloud-metadata plugin in the if-plugin repo. This WUE data could simply be added to the existing gsf-data.csv file and then exposed as a new output field. We tested by adding it to the manifest file and multiplying WUE by the power usage. For any data centers where the WUE was unknown, we defaulted to the industry average of 1.8 L/kWh.

Inspiration

We were inspired by the draft of a project idea posted on GitHub. While doing our research, we realized how water usage always seemed like a secondary issue to power usage and carbon emissions. Given that water usage and waste is a pretty challenging issue on a global scale, we were inspired by water starved communities to investigate how sustainable companies were making their data centres. It was also interesting for us to see data transparency by companies, and we wanted to use this to understand how reducing the space complexity of software could ultimately result in responsible water usage in data centres.

Challenges

There were two main challenges we faced in creating the modification. The first was of data transparency and availability. Since companies are not legally obligated to release data centre specific information, or specific metrics that will help with practical calculations, the data that was released by companies varied. For example, GCP only released PUE data and water usage data but not WUE for its data centres. Similarly, AWS only released their WUE as a fleet-wide average across the globe as opposed to data centre or even country specific information. Another challenge we faced was that of familiarizing ourselves with the impact framework. Understanding how all the plugins would fit together to give us a satisfactory understanding of water usage was a challenge but one that made us more responsible programmers.

Accomplishments

We accomplished our main goal of showing water usage data for the two main cloud service providers in the industry. We are proud that we were able to meet our goal and do it in a way where someone wouldn’t need to understand a whole new plugin. The fact that we were able to make it a part of the cloud-metadata plugin is a great accomplishment because we are able to give more information as part of a plugin made to give information on cloud providers. We also accomplished creating a script that gives us plots to be able to visually analyse carbon emissions and water usage. This is a great addition to the plugins created since it gives the user a quick idea of which server at which location is more environmentally friendly.

Learning

We learned a lot about using the IF framework and how the cloud-metadata plug-in works. We were also able to review the existing data and found that some of the Azure data center locations were incorrect. Once we updated the file to the correct locations, we were able to see the differences in power and water usage between data centres. Overall, we were disappointed that we were not able to find more detailed information on WUE or specific water usage per data center for AWS. Currently AWS appears to be the better solution when compared to Azure. We suspect the fleetwide AWS WUE is lower than specific data centers, but currently we have not found that granular of data. We also noticed the lower value for AWS is because they seem to be using a more efficient chip sets than Azure.

What’s next?

We hope that incorporating water usage data for data centres using the WUE metric is just the first step in looking at the impact on water due to software. We hope this solution will inspire more solutions that will incorporate more data and be able to give users the opportunity to make informed decisions about cloud providers and locations through data analysis. We expect this solution to be the first step in incorporating metrics beyond carbon in the Impact Framework eco-system and hopefully give rise to plugins that can analyze water data better for various other use cases.