EPFLiGHT / cumulator

A tool to quantify and report the carbon footprint of machine learning computations and communication
MIT License
21 stars 5 forks source link

|Cirrus CI|

.. |Cirrus CI| image:: https://api.cirrus-ci.com/github/epfl-iglobalhealth/cumulator.svg :target: https://cirrus-ci.com/github/epfl-iglobalhealth/cumulator

========= CUMULATOR

A tool to quantify and report the carbon footprint of machine learning computations and communication in academia and healthcare

Aim


Raise awareness about the carbon footprint of machine learning methods and to encourage further optimization and the rationale use of AI-powered tools. This work advocates for sustainable AI and the rational use of IT systems.

Key Carbon Indicators


Prerequisites


The tool works with Linux, Windows and MacOS

Required Libraries

To run the web app:

Install and use


Free software: MIT license

pip install cumulator <- installs CUMULATOR

from cumulator import base <- imports the script

cumulator = base.Cumulator() <- creates an Cumulator instance

Measure cost of computations.

::

cumulator = Cumulator()
model = LinearRegression()
diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)

# without output and with keywords arguments
cumulator.run(model.fit, X=diabetes_X, y=diabetes_y)

# with output and without keywords arguments
y = cumulator.run(model.predict, diabetes_X)

# show results
cumulator.computation_costs()

Measure cost of communications.

Any communication cost is calculated based on the 1-byte model, i.e. the energy consumed per-byte of data traffic in data centers, developped by The Shift Project. Note that this estimates the lower bound of the energy cost, since internet data taffic uses not only data centers, but also consumer-end servers. More information on project report (link at page's bottom).

Display your total carbon footprint

::

########
Overall carbon footprint: 1.02e-08 gCO2eq
########
Carbon footprint due to computations: 1.02e-08 gCO2eq
Carbon footprint due to communications: 0.00e+00 gCO2eq
This carbon footprint is equivalent to 1.68e-13 incandescent lamps switched to leds.

Web-app use

Cumulator also contains a web-app to automatically estimate the accuracy and the power consumption of 4 different algorithms (Linear Regression, Random Forest, Decision Tree, Neural Network) on the given dataset in input.

.. image:: src/cumulator/web_app/templates/app_image.png

To open the web-app, run src/cumulator/web_app/app.py, the web-app will then run on localhost. Through the use of the web-app is possible to upload an input dataset and to indicate which is the target column: it will be then automatically excluded from the accuracy and the power consumption computation.

Default assumptions: geo-localization, CPU-GPU detection (can be manually modified for better estimation):

Cumulator will try to detect the CPU and the GPU used and set the respective computation cost value. In case the detection fails the default value will be set. Future updates of the dataset of country consumption can be found on the official page (https://github.com/owid/energy-data?country=). It needs to be slightly modified to be used by Cumulator. An automatic script to transform the dataset is given in base_repository/country_dataset_helpers.py. To update the hardware dataset instead, a script in base_repository/hardware/webscraper.py can be used.

self.hardware_load = 250 / 3.6e6 <- computation costs: power consumption of a typical GPU in Watts converted to kWh/s

self.one_byte_model = 6.894E-8 <- communication costs: average energy impact of traffic in a typical data centers, kWh/kB

Cumulator will try to set the carbon intensity value based on the geographical position of the user. In case the detection fails the default value will be set. It is possible to manually modify the default value.

self.carbon_intensity = 447 <- conversion to carbon footprint: average carbon intensity value in gCO2eq/kWh in the EU in 2014

self.n_gpu = 1 <- number of GPU used in parallel

Prediction consumption and F1-Score on classification tasks

An example is reported below:

::

from base import Cumulator 
from sklearn.datasets import load_iris,load_diabetes  
import pandas as pd  
import numpy as np  

cumulator = Cumulator()
iris = load_diabetes()
data1 = pd.DataFrame(data= np.c_[iris['data'], iris['target']], columns= iris['feature_names'] + ['target'])
cumulator.predict_consumptions_f1(data1, 'target')

Important: The model used for prediction consumption and F1-Score has been trained on datasets with up to:

Therefore when using this feature please check if your datasets exceeds these values.

More information about the prediction feature and the recognition of the user position and GPU/CPU at https://github.com/epfl-iglobalhealth/CS433-2021-ecoML.

Project Structure


::

src/
├── cumulator  
    ├── base.py            <- implementation of the Cumulator class  
    ├── prediction_feature <- implementation of the prediction feature 
    ├── web_app            <- implementation of web app for the prediction feature
    └── bonus.py           <- Impact Statement Protocol  

Cite


Original paper: ::

@article{cumulator,
  title={A tool to quantify and report the carbon footprint of machine learning computations and communication in academia and healthcare},
  author={Tristan Trebaol, Mary-Anne Hartley, Martin Jaggi and Hossein Shokri Ghadikolaei},
  journal={Infoscience EPFL: record 278189},
  year={2020}
}

Contribute


Check CONTRIBUTING.rst

ChangeLog


Links