[Investigative Submission]: Energy per GB of Networking

DanniBradu commented 2 years ago

Contact Details

srrakhun@microsoft.com

Data request?

Networking emissions of g/GB. – We need to define this based on the architecture of the application. From the application analysis we will have the Data in and Data out of the specific application. What we would need as reference is the g/GB and we could do it in different ways

for web application
for mobile apps

Outline any further information requirements

No response

Code of Conduct

[X] I agree to follow this project's Code of Conduct

srini1978 commented 2 years ago

@mrchrisadams I think in essence, we want to understand the emissions associated with transferring GB of data across internet, between datacenters, between data center to consumer etc.

mrchrisadams commented 2 years ago

@srini1978 can you please share some links the case studies and specific use cases?

This one is quite a rabbit hole, and I after we've collated some sources of data on this issue we may yet need to escalate to actually commissioning some actual research for this, as most of the calculations I see use an attributional model, not a marginal model for calculating this.

I'll add some links from earlier discussions myself.

mrchrisadams commented 2 years ago

For what it's worth, this discussion is worth referring to, as it dives a bit more deeply into this subject.

https://github.com/Green-Software-Foundation/sci-data/issues/32

There are attributional approaches you can take to get a very coarse top level estimate by basically taking all the energy use associated with say… fixed core networks, and all the network transfer between them.

However this can be a bit like looking at all the embodied carbon to build a bunch of bike lanes, looking at the miles travelled on them, and then coming up with a rate to measure emissions for travel on those lines on a per kilometer basis.

You can derive a number, sure, but it might not be something you can meaningfully change through more or less usage.

If you really need a number for a rate

If you need a number, until others chime in, one approach you can use would be to use the 0.23 KwH per gigabyte figure cited here in this piece from Benjamin Davy at TEADS, then apply the improvement factor:

The most recent estimate for the year 2015 is from Jens Malmodin et al. ¹¹ at 0.023 kWh/GB for the IP core network only, excluding data centers and user devices.

If you follow the trend of network infrastructure getting roughly twice as efficient every two years (Koomey's Law) then a figure of 0.23 KWh / Gb ends up being 0.02875 in 2021 - close to ten times smaller.

year	Kwh / Gb
2015	0.23
2017	0.115
2019	0.0575
2021	0.02875

i need to stress though, I think this is something we'd need to escalate to get more eyes on this, or even advise on whether using the per gigabyte approach is sensible.

mrchrisadams commented 2 years ago

OK, checked with @camcash17. This is the approach used within Cloud Carbon Footprint where folks try to apply numbers for energy usage based on transfer too. Within a hyperscaler's network, their numbers are even more aggressive in terms of energy efficiency improvements - their figure is 0.001 kWh/Gb.

Studies to date#

There have not been many studies that deal specifically with estimating the electricity impact of exchanging data across data-centers. Most studies focus on estimating the impact of end-user traffic from the data center to the mobile phone; integrating the scope of the core network (what we are interested in), the local access to internet (optical fiber, copper, or 3G/4G/5G) and eventually the connection to the phone (WiFi or 4G).

On top of that, these studies use different methodologies and end up with results with orders of magnitude in differences. See appendix IV below for a summary of the most recent studies. Note that it is very hard to find recent studies that provide an estimation for optical fiber networks, the scope we are interested in.

Chosen coefficient#

It is safe to assume hyper-scale cloud providers have a very energy efficient network between their data centers with their own optical fiber networks and submarine cable [source]. Data exchanges between data-centers are also done with a very high bitrate (~100 GbE -> 100 Gbps), thus being the most efficient use-case. Given these assumptions, we have decided to use the smallest coefficient available to date: 0.001 kWh/Gb. Again, we welcome feedback or contributions to improve this coefficient.

We want to thank @martin-laurent for providing this research and recommended coefficient.

Source: Methodology | Cloud Carbon Footprint

mrchrisadams commented 2 years ago

Another new paper, with some primary data, I think:

Electricity Consumption and Operational Carbon Emissions of European Telecom Network Operators

This study presents operational electricity consumption and greenhouse gas emissions for named European telecom network operators during 2015–2018. These results are also compared to data for 2010–2015. The study provides an extensive primary data set, collected from European Telecommunication Network Operators (ETNO) members, covering operations in Europe and beyond, providing data with higher granularity than publicly available sources. The collected data set corresponds to roughly 36 percent of European subscriptions and 8 percent of global subscriptions. This data set was used to calculate the aggregated annual electricity consumption for the assessed operators, as well as associated subscription intensities, in total, for Europe and per network type. Moreover, aggregated electricity-related carbon emissions and emissions from other sources were calculated. Finally, estimates were made for the overall network operation in Europe for 2018 and 2020. The study concludes that the electricity consumption and number of subscriptions for the reporting telecom network operators remained nearly constant (+1 percent and −3 percent, respectively) between 2015 and 2018, while data traffic increased by a factor of three.

Source: Electricity Consumption and Operational Carbon Emissions of European Telecom Network Operators by @MDPIOpenAccess

da-ekchajzer commented 2 years ago

Hello all,

+1 on all those resources @mrchrisadams.

As fare as I know there are three main approaches :

Per line

You can find per line / per year impacts factors for Europe here : https://www.greenit.fr/wp-content/uploads/2021/12/EU-Study-LCA-7-DEC-EN.pdf#page=42

One possibility to allocate an impact to a specific process could be per hours of usage. You should evaluate the hours of usage of a line during one year to find the impact of a line per hour.

Limits

This won't make any difference between two process having the same duration but different data usage. This can be acceptable for non-intensive data processes since most of the impacts do not depend on data usage.

Per Go : linear

As @mrchrisadams mentioned the attributional per Go model can be used.

You can find interesting ressources below which differentiate fix and mobile network (note the impacts factors include manufacturing, transport, usage and end of life) :

For Europe I have this data : https://www.greenit.fr/wp-content/uploads/2021/12/EU-Study-LCA-7-DEC-EN.pdf#page=42
For France I have this data : https://www.arcep.fr/uploads/tx_gspublication/etude-numerique-environnement-ademe-arcep-volet02_janv2022.pdf#page=121

Per Go, Per user : Power model

As @mrchrisadams mentioned the linear approach implies that the impacts are proportional to the traffic. This is not true neither at user level nor at systemic level.

See for instance the electrical usage per workload for network equipment Malmodin & al. 2020

Screenshot_20220603_100723

We can see that the electrical usage of network equipments can be model with an affine function a.x+b where b would be the fix impact allocated to each user "using" the network (note : b should also add the maintenance and supports impacts which might be very important) and a would be the electrical factor per Go used.

Malmodin & al. 2020 call this approach the Power model and give some impacts factors for a and b, but I don't think that they can be extrapolated easily.

You can see more detailed explanations with equations here : https://github.com/Boavizta/boaviztapi/issues/62

My intuition

I think that the the "Power Model" approach is what we should be pushing for. It still lacks the embodied impacts and cannot be used since the data aren't available.

In my opinion, the best way to make the "Power model" usable would be to inventories all nodes used in a process, and for each node use its consumption profile depending on its type (the consumption profile would be an affine a.x + b)

It could be used as such for one user : consumption_profile(Go, nbUser) = Go.x + (b / nbUser)

``b``` could include manufacture or maintenance impacts allocated over the period of use of the device.

To make this method usable we need to come up with a consumption profile for each type of node.

Moreover, we could argue that most of the network impacts (fix usage impact + embodied impact) induce by a process rely on the capacity of that process to increase the daily demand spike.

This is due to the fact that the network provider must always have enough devices running to match the maximum demand. New devices will be installed only if the maximum demand increase (or is planned to increase) which won't be the case if an increase of traffic usage occurs at times of lower demand. Because most of the impacts of the network dependent on the installed capacity and not the usage we can argue that most of the impacts dependents on the maximum daily demand spike.

To account for such effect we could add a majoring factor which takes into account the ability of a process to increase the spike (depending on the data usage, the time when the process occurs, …).

It's only a work in progress, but I am interested in your comments !

srini1978 commented 2 years ago

@mrchrisadams based on @da-ekchajzer comments is there any revision to the base emissions/GB that is called out above -0.001 kWh/Gb

This in fact accounts only for the energy associated with the GB. it does not take into account the fixed line impact. Also we need to include the embodied emissions needed in setting up the networking infrastructure in the first place.

jawache commented 2 years ago

This question is boiling down to "What is a generalised SCI score for networking?"

I'm guessing 95% of people just have a question along the lines of "I transferred X GB of data, how do I account for that in terms of carbon?". They are just looking for a Carbon/GB number to plug into their calculations.

If we give them kWh / GB, they will still have to figure out Carbon / GB anyway AND somehow factor in an embodied amount also (I can already see another issue asking about embodied carbon for networking).

I believe we need to make that call ourselves, however opinionated. We've discussed in the standards calls before about erring on the side of caution and estimating too high a number instead of too low.

As a min we just need this number:

SCI Generalised Networking = Xg Carbon / GB

A nice to have would be these numbers:

SCI DC->DC Networking = Xg Carbon / GB SCI Consumer Networking = Xg Carbon / GB

@mrchrisadams what's the best approach, just use the 0.23 kWh / GB and multiply by the global average carbon intensity of electricity? I think we also need to include the embodied carbon of the networking and not just the use phase. One number that covers everything and is good enough for most situations.

lmastalerz commented 1 year ago

I'm wondering where we landed on this. I'm currently working on similar problem for assessing the impact of W3C sustainable web design recommendations and it would be nice to align calculations. @da-ekchajzer, great point on different weights depending on the time of the day. There's a great paper from the University of Bristol explaining just this: Rethinking Allocation in High-Baseload Systems: A Demand-Proportional Network Electricity Intensity Metric @mrchrisadams I'm looking at a 2021 research from ETH Zürich, and the extrapolation you mentioned (efficiency doubles every two years) seems to work really well - their 0.02 kWh/GB for WAN is very close to the predicted value.

ceddlyburge commented 1 year ago

I'm not sure how relevant this is, but all the discussion I have seen focusses on transferring an amount of data over the internet, but nothing relates to the route the data travels, when clearly this is a big factor. If we want to, for example, work out the benefits of using a CDN, we will need to focus on the distance / route of the data, as well as just the amount.

lmastalerz commented 1 year ago

This is a great point @ceddlyburge! Both distance and route have a potential of making an order(s) of magnitude of difference. Do you have any specific ideas on how this could be incorporated into the metric?

jawache commented 1 year ago

In terms of high level (greatest chance of being used and actioned) I would just have 3 values

Carbon / GB - DC to DC (for answering the question should I move data to cleaner compute) Carbon / GB - DC to End User (for none cached calls) Carbon / GB - Edge Cache to End User (for cached calls)

With lots of averages and modelling.

da-ekchajzer commented 1 year ago

Those figures could be helpful in a first approach.

In my humble opinion, what we should push in the long run is a method that is able to model the impacts of a data transfer hover a specific routes. We would give a typical route for our data transfer (node1—node2—node3) and the method will be able to generate the consumption profile of our route (a.x+b). See : https://github.com/Green-Software-Foundation/sci-guide/issues/13#issuecomment-1145745271

Since all nodes are hosted in a specific region, we could also apply different intensity in case of transnational routes.

To be able to do so we would need to characterize several types of nodes (Wi-Fi router, aggregation router, LAN router...). For each node, we would compute a and b based on the inventory known or assumed and the consumption profile of each device in those nodes.

To implement these methods, we would need to collect (from crowdsourcing) for each type of node :

Inventory of devices
Consumption profiles for network equipments (f(load)=consumption). As a first approach, we could assume the profile based on the maximal power often given by the manufacturer.
Average number of lines using the node in a given timeframe
Average quantity of data in a given timeframe

davidmytton commented 1 year ago

Using energy per unit of data is only valid if you know the total energy consumption and total data transferred by the network over a particular period. Networks have no direct power/data proportionality - there is almost no difference in energy consumption whether they are at zero or full load. Energy consumption is instead a factor of capacity, with some variation by usage as you get into the customer premises equipment and the user device. This is described most recently in the Malmodin paper already referenced above, with a real-world example shown in the 2021 Carbon Trust video streaming white paper.

The challenge is that implementing Malmodin's power model requires data from the network equipment to accurately calculate the power consumption. This expands the system boundary to multiple network operators, customer equipment, and the user device. Even then that's a simplification because of caching at various points (origin, local PoPs, user browser, etc) and how networks evolve over time.

This makes it tempting to just use a historical average, but it's going to wildly misrepresent the figures so as to be useless. Indeed, extrapolating network energy per unit of data averages is the source of the extreme projections of IT/network energy over the coming years. Total network energy is falling even as data usage is rapidly increasing. This observation invalidates the use of historical averages for present or future estimates.

My suggestion is to split the network into components and use the power model from the Malmodin paper. Each component behaves differently, which is why you need to split them. I have implemented both conventional and power models in a Python notebook as part of a more detailed paper I'm co-authoring with Malmodin (and others) which you're welcome to use (it's open source, and will be released properly as part of the paper hopefully later this year).

ceddlyburge commented 1 year ago

This is a great point @ceddlyburge! Both distance and route have a potential of making an order(s) of magnitude of difference. Do you have any specific ideas on how this could be incorporated into the metric?

Thanks Lukasz! I'm afraid I don't really have the answers, I'm more in search of them. I have some thoughts though.

There aren't that many big data centers in the world, and all the cloud providers are looking to improving their green credentials. I think it would be possible to create a matrix of these data centers and an estimate of the cost of transfer between them. This should include things like construction and maintenance cost of sub sea cables, if these are significant. Maybe this analysis would show that given the vast volumes of data, the cost per Mb doesn't vary much, and so a single value could be used. This could be used to check that demand shifting to a different data center makes sense for example.
There are not that many countries in the world either, so probably a matrix of countries is also possible (probably assuming a data center in the source country and an end user in the target country). This could be used along with analytics and other factors to help decide where to host a database for example.
A cost / Km would be useful. This would obviously be very rough, but could also be combined with analytics to decide where to host a website for example, and whether a CDN would be helpful.
I wouldn't worry about the last mile to the users device (5g or wifi or whatever), it isn't something we have control over.

ceddlyburge commented 1 year ago

Also, I wonder whether we could use transfer speed as a proxy for carbon? A long single hop along fibre optic is probably faster and less carbon intensive than a route with many short wired hops. If we could establish a reasonable correlation between the two it might make things a lot better, as the transfer speed is very easy to measure.

Henry-WattTime commented 1 year ago

@davidmytton This is very interesting: "Networks have no direct power/data proportionality - there is almost no difference in energy consumption whether they are at zero or full load." So as an end user you can't reduce emissions of the network by reducing your data throughput? That energy use (and associated emissions) is going to happen regardless of how much data you transfer? Is there a way that a user can affect emissions/electricity consumption of networking hardware?

davidmytton commented 1 year ago

So as an end user you can't reduce emissions of the network by reducing your data throughput? That energy use (and associated emissions) is going to happen regardless of how much data you transfer?

Correct. The user has no ability to influence the energy consumption of the network.

For the user's home router (commonly referred to as the customer premises equipment, CPE, in modeling), there is a baseline of energy consumption regardless of use, plus a marginal component dependent on usage. However, that marginal component is very small - a couple of watts for a typical video streaming session - and is a factor of used percentage of capacity. The home router made up 38% of the energy consumption in the Carbon Trust report video streaming example.

The user device comprises the majority of the energy consumption (51% in the Carbon Trust report), however changes in data volume also have tiny impacts on the energy. A recent observational study showed streaming 720p vs 1080p video resolutions resulted in a device energy consumption difference of only ~2W.

For the home router and user device, the ability for the user to impact energy consumption in any meaningful way really comes down to whether they're on or off.

ceddlyburge commented 1 year ago

I feel like data transfer size still affects energy, despite this. For example, there is a point at which an existing network is saturated, and new network components are deployed, and these have an energy cost. Attributing each byte of transfer to each joule of energy will clearly involve some assumptions though :)

davidmytton commented 1 year ago

Networks do get upgraded based on planned capacity growth, but they are always over-provisioned to cover at least peak load and usually more than that for growth headroom and redundancy. However, this is independent of point-in-time data volume so you can't attribute the increase to it.

There might not even be any increase. Numerous network operators report flat or decreasing energy consumption even as data volumes grow rapidly. Cogent reported a 2017-2021 compound annual network traffic growth rate of 32.7% but a decrease of 16.6% for electricity consumed. Virgin Media reported a 29.9% reduction of energy consumption per unit of data in 2021 compared to 2020 and 88% compared to 2015. Sprint total energy consumption has remained flat (~1.9 MWh for each of 2014 to 2019) even as network data usage has increased

Calculating joules (or kWh) per unit of data can only be done once you have total data transfer and energy consumption. That gives you historical energy intensity, but it's not useful for attribution in the present or future.

Henry-WattTime commented 1 year ago

This aligns with what we are finding with embodied emission from hardware as well. The hardware already exists (data centers are overprovisioned with hardware to meet peak capacity, the emissions were already caused from manufacturing), but we still encourage people to use less hardware with how the SCI is structured.

Is this a problem that network providers are working on? Do they idle machines in low energy states when there is less throughput (that seems to be the case in datacenters)? Is that even possible?

davidmytton commented 1 year ago

Is this a problem that network providers are working on? Do they idle machines in low energy states when there is less throughput (that seems to be the case in datacenters)? Is that even possible?

It's more of a challenge to do this with network equipment because of how quickly a networking device needs to come out of an idle state. Power cycling network equipment is generally inefficient because it requires not just restarting the device, but also taking various steps at the protocol layer such as renegotiation of data rate and buffering of traffic whilst the deactivated device becomes available again. There is work on overall energy efficiency (which is why you see the reducing energy intensities and total energy consumption), but not in relation to power proportionality.

This is a bit different for mobile. For 4G networks, the baseline component accounts for 70-90% of the total energy consumption. 5G technologies include “sleep mode” functionality which can reduce the energy intensity by 8-12x compared to 4G (3-5x lower without sleep mode) and make it more proportional to usage (from a study of networks in Belgium).

Henry-WattTime commented 1 year ago

Interesting discussion of this issue: https://fershad.com/writing/website-carbon-beyond-data-transfer/

Green-Software-Foundation / sci-guide