Green-Software-Foundation / hack

Carbon Hack 24 - The annual hackathon from the Green Software Foundation
https://grnsft.org/hack/github
15 stars 2 forks source link

Object storage Carbon estimates #101

Open nickbarber opened 6 months ago

nickbarber commented 6 months ago

Prize category

Best Plugin

Overview

A plugin to estimate the carbon emissions of object storage based on a few factors such as:

Questions to be answered

No response

Have you got a project team yet?

Yes and we aren't recruiting

Project team

@nickbarber, @mgriffin-scottlogic, @jmain-scottlogic, @ishmael-burdeau

Terms of Participation

Project Submission

Summary

A series of plugins to estimate the energy used by data storage, particularly cloud object storage as well as the impact of reading and writing data to storage devices. We have created 4 plugins that can be used in conjunction with each other, and other plugins as needed. One to estimate the energy used by cloud object storage, one to return a replication factor multiplication based on defaults of some cloud providers, one to estimate the energy consumed by stored data and one to estimate the energy consumed by reading/writing data.

Problems

How to calculate the carbon emissions generated by data storage and the reading and writing of data as well as object (blob) storage.

Application

Consists of multiple plugins, designed so that they can be used together or as separate components for non-cloud usage. One plugin retrieves the replication factor for cloud storage services. Another takes drive size, power along with duration, data stored and the replication factor to estimate the total energy associated with storage.

Prize category

Best Plugin

Judging criteria

Gives users a way to calculate the carbon emissions of their data storage, and has been created in such a way that it can be applied from single drives up to cloud managed services as long as the user has access to the relevant required data.

Video

https://www.youtube.com/watch?v=dGMidYCsEnk

Artefacts

https://github.com/mgriffin-scottlogic/if-carbon-hack-plugin

Usage

https://github.com/mgriffin-scottlogic/if-carbon-hack-plugin?tab=readme-ov-file#usage

Process

We tried to get data out of cloud services that would give us an indication of what was being used on the back end of cloud object storage services but were unable to get much useful information. We used an open-source object storage system to test hypothesis’ we had about the impact of storage as well as looking up research others had done on the topic. We therefore decided to create a plugin solution in it’s simplest form, to output energy used by a storage device. We discovered energy usage differs drastically on read/write vs idle and therefore split the plugins.

Inspiration

Discussions between Scott Logic and DWP and what impact 30TB of data in S3 has on carbon emissions and that there was no clear way to calculate or estimate outside of AWS reporting.

Challenges

Getting data out of cloud services and understanding what is happening at lower levels, especially with replication, redundancy and also availability levels (e.g. intelligent tiering) Finding the right places to break up plugins Understanding the CPU/Memory impact of object storage on top of the storage component, whether it be hosted service or a system running locally.

Accomplishments

Calculating estimate within reasonable error of AWS own reporting for common crawl Making use of existing if-plugins to estimate the embodied carbon of storage

Learnings

Simply storing data has less impact than we expected. CPU usage for reading/writing etc has a bigger impact. There is lots of scope to reuse the standard impact framework plugins in helpful ways (reusing embodied for data on drives)

What's next?

Further research into what information is required to get more detailed calculations for cloud object storage services.

Computation and memory overheads of object storage systems

Erasure coding vs replication.

Automated duplication of observations for replicated regions

jawache commented 6 months ago

This is such an important proposal, lots is written about CPU carbon emissions from computation, but emission from storage/DB usage is an important gap we need to fill 🙏

mgriffin-scottlogic commented 6 months ago

Collating some of the sources we are looking at: