Open nickbarber opened 8 months ago
This is such an important proposal, lots is written about CPU carbon emissions from computation, but emission from storage/DB usage is an important gap we need to fill 🙏
Collating some of the sources we are looking at:
Prize category
Best Plugin
Overview
A plugin to estimate the carbon emissions of object storage based on a few factors such as:
Questions to be answered
No response
Have you got a project team yet?
Yes and we aren't recruiting
Project team
@nickbarber, @mgriffin-scottlogic, @jmain-scottlogic, @ishmael-burdeau
Terms of Participation
Project Submission
Summary
A series of plugins to estimate the energy used by data storage, particularly cloud object storage as well as the impact of reading and writing data to storage devices. We have created 4 plugins that can be used in conjunction with each other, and other plugins as needed. One to estimate the energy used by cloud object storage, one to return a replication factor multiplication based on defaults of some cloud providers, one to estimate the energy consumed by stored data and one to estimate the energy consumed by reading/writing data.
Problems
How to calculate the carbon emissions generated by data storage and the reading and writing of data as well as object (blob) storage.
Application
Consists of multiple plugins, designed so that they can be used together or as separate components for non-cloud usage. One plugin retrieves the replication factor for cloud storage services. Another takes drive size, power along with duration, data stored and the replication factor to estimate the total energy associated with storage.
Prize category
Best Plugin
Judging criteria
Gives users a way to calculate the carbon emissions of their data storage, and has been created in such a way that it can be applied from single drives up to cloud managed services as long as the user has access to the relevant required data.
Video
https://www.youtube.com/watch?v=dGMidYCsEnk
Artefacts
https://github.com/mgriffin-scottlogic/if-carbon-hack-plugin
Usage
https://github.com/mgriffin-scottlogic/if-carbon-hack-plugin?tab=readme-ov-file#usage
Process
We tried to get data out of cloud services that would give us an indication of what was being used on the back end of cloud object storage services but were unable to get much useful information. We used an open-source object storage system to test hypothesis’ we had about the impact of storage as well as looking up research others had done on the topic. We therefore decided to create a plugin solution in it’s simplest form, to output energy used by a storage device. We discovered energy usage differs drastically on read/write vs idle and therefore split the plugins.
Inspiration
Discussions between Scott Logic and DWP and what impact 30TB of data in S3 has on carbon emissions and that there was no clear way to calculate or estimate outside of AWS reporting.
Challenges
Getting data out of cloud services and understanding what is happening at lower levels, especially with replication, redundancy and also availability levels (e.g. intelligent tiering) Finding the right places to break up plugins Understanding the CPU/Memory impact of object storage on top of the storage component, whether it be hosted service or a system running locally.
Accomplishments
Calculating estimate within reasonable error of AWS own reporting for common crawl Making use of existing if-plugins to estimate the embodied carbon of storage
Learnings
Simply storing data has less impact than we expected. CPU usage for reading/writing etc has a bigger impact. There is lots of scope to reuse the standard impact framework plugins in helpful ways (reusing embodied for data on drives)
What's next?
Further research into what information is required to get more detailed calculations for cloud object storage services.
Computation and memory overheads of object storage systems
Erasure coding vs replication.
Automated duplication of observations for replicated regions