NREL / foundational-industry-energy-data

The Foundational Industry Energy Dataset (FIED) is a unit-level characterization of energy use in the U.S. industrial sector.
https://nrel.github.io/foundational-industry-energy-data/
2 stars 0 forks source link
energy-consumption energy-data industry

Foundational Industry Energy Dataset (FIED)

Summary

This is an effort by the National Renewable Energy Laboratory (NREL) and Argonne National Laboratory (ANL) to create an experimental foundational industry dataset for energy and emissions analysis and modeling. The code draws from various publicly-available data, primarily from the U.S. EPA, to compile a data set on unit-level energy use and characterization for U.S. industrial facilities in 2017.

The FIED, and the accompanying technical report, can be downloaded from its Open Energy Data Initiative submission.

Getting Started

Manual Data Downloads

Due to the nature of how they are provided, several data sets must be manually downloaded before the code can be run sucessfully. These data sets and their director locations are:

  1. Source Classification Codes (SCCs)

  2. 2017 National Emissions Inventory (NEI)

  3. GHGRP Emissions by Unit and Fuel Type

Environment

fied_environment.yml is the conda environment used when creating the foundational dataset. Its key dependencies include:

Compiling the FIED

In addition to manually downloading the above datasets, executing the calulations and data compilation requires two steps after activating the fied environment.

  1. ./frs/frs_extraction.py. This will download, extract, and format EPA FRS data. The resulting csv should be saved in data/FRS/.
  2. fied_compilation.py. This will execute all of the remaining steps for compiling the foundational data set.

So, from the terminal or Anaconda prompt:

conda activate fied

python ./frs/frs_extraction.py

python fied_compilation.py

Directory Navigation

The underlying submodules and data are organized as follows:

Overivew of FIED Data Fields

Data fields are compiled and described in FIED_datafields.yml. All facilities in the data set are represented by their unique registryID, which is their EPA Facility Registry Service ID.

Many of these data fields were included in original EPA data sources. See the FRS data dictionary for more information.

Identity

In addition to registryID, other identifying fields include

Geography

Various levels of geographic identifiers are included, such as

Units and Processes

Individual units are characterized (e.g., unit type, capacity, energy, throughput) where possible. Individual units may be associated with multiple processes.

Energy

Depending on the estimation approach, a unit may have a single estimate of energy use, or a range of energy estimates (i.e., minimum, median, upper quartile). Energy estimates based on the NEI are presented as a range.

Greenhouse Gas (GHG) Emissions

Other

We've attempted to include additional descriptive fields where possible. These tend to be sparsely populated at this time.