Open DanniBradu opened 2 years ago
@mrchrisadams I think in essence, we want to understand the emissions associated with transferring GB of data across internet, between datacenters, between data center to consumer etc.
@srini1978 can you please share some links the case studies and specific use cases?
This one is quite a rabbit hole, and I after we've collated some sources of data on this issue we may yet need to escalate to actually commissioning some actual research for this, as most of the calculations I see use an attributional model, not a marginal model for calculating this.
I'll add some links from earlier discussions myself.
For what it's worth, this discussion is worth referring to, as it dives a bit more deeply into this subject.
https://github.com/Green-Software-Foundation/sci-data/issues/32
There are attributional approaches you can take to get a very coarse top level estimate by basically taking all the energy use associated with say… fixed core networks, and all the network transfer between them.
However this can be a bit like looking at all the embodied carbon to build a bunch of bike lanes, looking at the miles travelled on them, and then coming up with a rate to measure emissions for travel on those lines on a per kilometer basis.
You can derive a number, sure, but it might not be something you can meaningfully change through more or less usage.
If you need a number, until others chime in, one approach you can use would be to use the 0.23 KwH per gigabyte figure cited here in this piece from Benjamin Davy at TEADS, then apply the improvement factor:
The most recent estimate for the year 2015 is from Jens Malmodin et al. ¹¹ at 0.023 kWh/GB for the IP core network only, excluding data centers and user devices.
If you follow the trend of network infrastructure getting roughly twice as efficient every two years (Koomey's Law) then a figure of 0.23 KWh / Gb ends up being 0.02875 in 2021 - close to ten times smaller.
year | Kwh / Gb |
---|---|
2015 | 0.23 |
2017 | 0.115 |
2019 | 0.0575 |
2021 | 0.02875 |
i need to stress though, I think this is something we'd need to escalate to get more eyes on this, or even advise on whether using the per gigabyte approach is sensible.
OK, checked with @camcash17. This is the approach used within Cloud Carbon Footprint where folks try to apply numbers for energy usage based on transfer too. Within a hyperscaler's network, their numbers are even more aggressive in terms of energy efficiency improvements - their figure is 0.001 kWh/Gb.
Studies to date#
There have not been many studies that deal specifically with estimating the electricity impact of exchanging data across data-centers. Most studies focus on estimating the impact of end-user traffic from the data center to the mobile phone; integrating the scope of the core network (what we are interested in), the local access to internet (optical fiber, copper, or 3G/4G/5G) and eventually the connection to the phone (WiFi or 4G).
On top of that, these studies use different methodologies and end up with results with orders of magnitude in differences. See appendix IV below for a summary of the most recent studies. Note that it is very hard to find recent studies that provide an estimation for optical fiber networks, the scope we are interested in.
Chosen coefficient#
It is safe to assume hyper-scale cloud providers have a very energy efficient network between their data centers with their own optical fiber networks and submarine cable [source]. Data exchanges between data-centers are also done with a very high bitrate (~100 GbE -> 100 Gbps), thus being the most efficient use-case. Given these assumptions, we have decided to use the smallest coefficient available to date: 0.001 kWh/Gb. Again, we welcome feedback or contributions to improve this coefficient.
We want to thank @martin-laurent for providing this research and recommended coefficient.
Another new paper, with some primary data, I think:
Electricity Consumption and Operational Carbon Emissions of European Telecom Network Operators
This study presents operational electricity consumption and greenhouse gas emissions for named European telecom network operators during 2015–2018. These results are also compared to data for 2010–2015. The study provides an extensive primary data set, collected from European Telecommunication Network Operators (ETNO) members, covering operations in Europe and beyond, providing data with higher granularity than publicly available sources. The collected data set corresponds to roughly 36 percent of European subscriptions and 8 percent of global subscriptions. This data set was used to calculate the aggregated annual electricity consumption for the assessed operators, as well as associated subscription intensities, in total, for Europe and per network type. Moreover, aggregated electricity-related carbon emissions and emissions from other sources were calculated. Finally, estimates were made for the overall network operation in Europe for 2018 and 2020. The study concludes that the electricity consumption and number of subscriptions for the reporting telecom network operators remained nearly constant (+1 percent and −3 percent, respectively) between 2015 and 2018, while data traffic increased by a factor of three.
Source: Electricity Consumption and Operational Carbon Emissions of European Telecom Network Operators by @MDPIOpenAccess
Hello all,
+1 on all those resources @mrchrisadams.
As fare as I know there are three main approaches :
You can find per line / per year impacts factors for Europe here : https://www.greenit.fr/wp-content/uploads/2021/12/EU-Study-LCA-7-DEC-EN.pdf#page=42
One possibility to allocate an impact to a specific process could be per hours of usage. You should evaluate the hours of usage of a line during one year to find the impact of a line per hour.
As @mrchrisadams mentioned the attributional per Go model can be used.
You can find interesting ressources below which differentiate fix and mobile network (note the impacts factors include manufacturing, transport, usage and end of life) :
As @mrchrisadams mentioned the linear approach implies that the impacts are proportional to the traffic. This is not true neither at user level nor at systemic level.
See for instance the electrical usage per workload for network equipment Malmodin & al. 2020
We can see that the electrical usage of network equipments can be model with an affine function a.x+b
where b
would be the fix impact allocated to each user "using" the network (note : b
should also add the maintenance and supports impacts which might be very important) and a
would be the electrical factor per Go used.
Malmodin & al. 2020 call this approach the Power model and give some impacts factors for a
and b
, but I don't think that they can be extrapolated easily.
You can see more detailed explanations with equations here : https://github.com/Boavizta/boaviztapi/issues/62
I think that the the "Power Model" approach is what we should be pushing for. It still lacks the embodied impacts and cannot be used since the data aren't available.
In my opinion, the best way to make the "Power model" usable would be to inventories all nodes used in a process, and for each node use its consumption profile depending on its type (the consumption profile would be an affine a.x + b
)
It could be used as such for one user :
consumption_profile(Go, nbUser) = Go.x + (b / nbUser)
``b``` could include manufacture or maintenance impacts allocated over the period of use of the device.
To make this method usable we need to come up with a consumption profile for each type of node.
Moreover, we could argue that most of the network impacts (fix usage impact + embodied impact) induce by a process rely on the capacity of that process to increase the daily demand spike.
This is due to the fact that the network provider must always have enough devices running to match the maximum demand. New devices will be installed only if the maximum demand increase (or is planned to increase) which won't be the case if an increase of traffic usage occurs at times of lower demand. Because most of the impacts of the network dependent on the installed capacity and not the usage we can argue that most of the impacts dependents on the maximum daily demand spike.
To account for such effect we could add a majoring factor which takes into account the ability of a process to increase the spike (depending on the data usage, the time when the process occurs, …).
It's only a work in progress, but I am interested in your comments !
@mrchrisadams based on @da-ekchajzer comments is there any revision to the base emissions/GB that is called out above -0.001 kWh/Gb
This in fact accounts only for the energy associated with the GB. it does not take into account the fixed line impact. Also we need to include the embodied emissions needed in setting up the networking infrastructure in the first place.
This question is boiling down to "What is a generalised SCI score for networking?"
I'm guessing 95% of people just have a question along the lines of "I transferred X GB of data, how do I account for that in terms of carbon?". They are just looking for a Carbon/GB number to plug into their calculations.
If we give them kWh / GB, they will still have to figure out Carbon / GB anyway AND somehow factor in an embodied amount also (I can already see another issue asking about embodied carbon for networking).
I believe we need to make that call ourselves, however opinionated. We've discussed in the standards calls before about erring on the side of caution and estimating too high a number instead of too low.
As a min we just need this number:
SCI Generalised Networking = Xg Carbon / GB
A nice to have would be these numbers:
SCI DC->DC Networking = Xg Carbon / GB SCI Consumer Networking = Xg Carbon / GB
@mrchrisadams what's the best approach, just use the 0.23 kWh / GB and multiply by the global average carbon intensity of electricity? I think we also need to include the embodied carbon of the networking and not just the use phase. One number that covers everything and is good enough for most situations.
I'm wondering where we landed on this. I'm currently working on similar problem for assessing the impact of W3C sustainable web design recommendations and it would be nice to align calculations. @da-ekchajzer, great point on different weights depending on the time of the day. There's a great paper from the University of Bristol explaining just this: Rethinking Allocation in High-Baseload Systems: A Demand-Proportional Network Electricity Intensity Metric @mrchrisadams I'm looking at a 2021 research from ETH Zürich, and the extrapolation you mentioned (efficiency doubles every two years) seems to work really well - their 0.02 kWh/GB for WAN is very close to the predicted value.
I'm not sure how relevant this is, but all the discussion I have seen focusses on transferring an amount of data over the internet, but nothing relates to the route the data travels, when clearly this is a big factor. If we want to, for example, work out the benefits of using a CDN, we will need to focus on the distance / route of the data, as well as just the amount.
This is a great point @ceddlyburge! Both distance and route have a potential of making an order(s) of magnitude of difference. Do you have any specific ideas on how this could be incorporated into the metric?
In terms of high level (greatest chance of being used and actioned) I would just have 3 values
Carbon / GB - DC to DC (for answering the question should I move data to cleaner compute) Carbon / GB - DC to End User (for none cached calls) Carbon / GB - Edge Cache to End User (for cached calls)
With lots of averages and modelling.
Those figures could be helpful in a first approach.
In my humble opinion, what we should push in the long run is a method that is able to model the impacts of a data transfer hover a specific routes. We would give a typical route for our data transfer (node1—node2—node3) and the method will be able to generate the consumption profile of our route (a.x+b)
. See : https://github.com/Green-Software-Foundation/sci-guide/issues/13#issuecomment-1145745271
Since all nodes are hosted in a specific region, we could also apply different intensity in case of transnational routes.
To be able to do so we would need to characterize several types of nodes (Wi-Fi router, aggregation router, LAN router...). For each node, we would compute a
and b
based on the inventory known or assumed and the consumption profile of each device in those nodes.
To implement these methods, we would need to collect (from crowdsourcing) for each type of node :
f(load)=consumption
). As a first approach, we could assume the profile based on the maximal power often given by the manufacturer.Using energy per unit of data is only valid if you know the total energy consumption and total data transferred by the network over a particular period. Networks have no direct power/data proportionality - there is almost no difference in energy consumption whether they are at zero or full load. Energy consumption is instead a factor of capacity, with some variation by usage as you get into the customer premises equipment and the user device. This is described most recently in the Malmodin paper already referenced above, with a real-world example shown in the 2021 Carbon Trust video streaming white paper.
The challenge is that implementing Malmodin's power model requires data from the network equipment to accurately calculate the power consumption. This expands the system boundary to multiple network operators, customer equipment, and the user device. Even then that's a simplification because of caching at various points (origin, local PoPs, user browser, etc) and how networks evolve over time.
This makes it tempting to just use a historical average, but it's going to wildly misrepresent the figures so as to be useless. Indeed, extrapolating network energy per unit of data averages is the source of the extreme projections of IT/network energy over the coming years. Total network energy is falling even as data usage is rapidly increasing. This observation invalidates the use of historical averages for present or future estimates.
My suggestion is to split the network into components and use the power model from the Malmodin paper. Each component behaves differently, which is why you need to split them. I have implemented both conventional and power models in a Python notebook as part of a more detailed paper I'm co-authoring with Malmodin (and others) which you're welcome to use (it's open source, and will be released properly as part of the paper hopefully later this year).
This is a great point @ceddlyburge! Both distance and route have a potential of making an order(s) of magnitude of difference. Do you have any specific ideas on how this could be incorporated into the metric?
Thanks Lukasz! I'm afraid I don't really have the answers, I'm more in search of them. I have some thoughts though.
Also, I wonder whether we could use transfer speed as a proxy for carbon? A long single hop along fibre optic is probably faster and less carbon intensive than a route with many short wired hops. If we could establish a reasonable correlation between the two it might make things a lot better, as the transfer speed is very easy to measure.
@davidmytton This is very interesting: "Networks have no direct power/data proportionality - there is almost no difference in energy consumption whether they are at zero or full load." So as an end user you can't reduce emissions of the network by reducing your data throughput? That energy use (and associated emissions) is going to happen regardless of how much data you transfer? Is there a way that a user can affect emissions/electricity consumption of networking hardware?
So as an end user you can't reduce emissions of the network by reducing your data throughput? That energy use (and associated emissions) is going to happen regardless of how much data you transfer?
Correct. The user has no ability to influence the energy consumption of the network.
For the user's home router (commonly referred to as the customer premises equipment, CPE, in modeling), there is a baseline of energy consumption regardless of use, plus a marginal component dependent on usage. However, that marginal component is very small - a couple of watts for a typical video streaming session - and is a factor of used percentage of capacity. The home router made up 38% of the energy consumption in the Carbon Trust report video streaming example.
The user device comprises the majority of the energy consumption (51% in the Carbon Trust report), however changes in data volume also have tiny impacts on the energy. A recent observational study showed streaming 720p vs 1080p video resolutions resulted in a device energy consumption difference of only ~2W.
For the home router and user device, the ability for the user to impact energy consumption in any meaningful way really comes down to whether they're on or off.
I feel like data transfer size still affects energy, despite this. For example, there is a point at which an existing network is saturated, and new network components are deployed, and these have an energy cost. Attributing each byte of transfer to each joule of energy will clearly involve some assumptions though :)
Networks do get upgraded based on planned capacity growth, but they are always over-provisioned to cover at least peak load and usually more than that for growth headroom and redundancy. However, this is independent of point-in-time data volume so you can't attribute the increase to it.
There might not even be any increase. Numerous network operators report flat or decreasing energy consumption even as data volumes grow rapidly. Cogent reported a 2017-2021 compound annual network traffic growth rate of 32.7% but a decrease of 16.6% for electricity consumed. Virgin Media reported a 29.9% reduction of energy consumption per unit of data in 2021 compared to 2020 and 88% compared to 2015. Sprint total energy consumption has remained flat (~1.9 MWh for each of 2014 to 2019) even as network data usage has increased
Calculating joules (or kWh) per unit of data can only be done once you have total data transfer and energy consumption. That gives you historical energy intensity, but it's not useful for attribution in the present or future.
This aligns with what we are finding with embodied emission from hardware as well. The hardware already exists (data centers are overprovisioned with hardware to meet peak capacity, the emissions were already caused from manufacturing), but we still encourage people to use less hardware with how the SCI is structured.
Is this a problem that network providers are working on? Do they idle machines in low energy states when there is less throughput (that seems to be the case in datacenters)? Is that even possible?
Is this a problem that network providers are working on? Do they idle machines in low energy states when there is less throughput (that seems to be the case in datacenters)? Is that even possible?
It's more of a challenge to do this with network equipment because of how quickly a networking device needs to come out of an idle state. Power cycling network equipment is generally inefficient because it requires not just restarting the device, but also taking various steps at the protocol layer such as renegotiation of data rate and buffering of traffic whilst the deactivated device becomes available again. There is work on overall energy efficiency (which is why you see the reducing energy intensities and total energy consumption), but not in relation to power proportionality.
This is a bit different for mobile. For 4G networks, the baseline component accounts for 70-90% of the total energy consumption. 5G technologies include “sleep mode” functionality which can reduce the energy intensity by 8-12x compared to 4G (3-5x lower without sleep mode) and make it more proportional to usage (from a study of networks in Belgium).
Interesting discussion of this issue: https://fershad.com/writing/website-carbon-beyond-data-transfer/
Contact Details
srrakhun@microsoft.com
Data request?
Networking emissions of g/GB. – We need to define this based on the architecture of the application. From the application analysis we will have the Data in and Data out of the specific application. What we would need as reference is the g/GB and we could do it in different ways
Outline any further information requirements
No response
Code of Conduct