This is a repository of code to download, compile and clean high-frequency electricty generation and emissions data for the United States.
Copyright 2022 James Archsmith and Paige Weber Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Code in this package will download and clean up data from many sources. The code is designed to run in a custom Anaconda (https://www.anaconda.com/) environment defined by the configuration file USElecData_conda.yaml
. To use the code in this repository you should do the following:
conda env create -f USElecData_conda.yaml
USElecData
Anaconda environment using the command conda activate USElecData
Windows: USElecData init --output-path=<path_to_ouput_data>
Mac/Linux: ./USElecData.sh init --output-path=<path_to_ouput_data>
This will create a local configuration file and set environment variables within your enviroment that scripts will use later. The <path_to_output_data>
is a folder where you would like the data to be stored on your local system. Anticiapte serveral hundred gigabytes. During this process you will be prompted to enter an API key for Data.gov. This is required to access some US government data APIs. If you don't currently have an API key, the script will provide a link for you to sign up for one.
USElecData
Anaconda environment using the commands conda deactivate
conda activate USElecData
Windows: USElecData source all
Mac/Linux: ./USElecData.sh source all
Windows: USElecData build all
Mac/Linux: ./USElecData.sh build all
It is strongly recommended you run all the code in this repository in the Anaconda environment defined by USElecData_conda.yaml
. This provides a clean and consistent environment for all of the code, handles package management and enables several features that will hopefully streamline the build process. The code has only been tested against that environment. It will possibly run outside the enviroment (with some work), but you may encounter unforseen errors.
At a minimum you will need to put the file src/lib/USElecDataClass.py
somewhere in your path (one possibility is to add src/lib
to your PYTHONPATH
environment variable).
Initially this data repository needed to be built through a series of manual steps. These instructions cover those steps.
After pulling the repo, you should create a local configuration file by editing ./config_local_template.yaml
to point to the local path where you will store source, intermediate, and output data. Save this file as ./config_local.yaml
. DO NOT MAKE YOUR DATA PATH A SUBDIRECTORY OF YOUR LOCAL CODE REPOSITORY. You should then run each of the Python/R scripts in the order described in the Markdown file in each subfolder of the src
directory. The src
folders should be loaded an cleaned in the following order:
tz-info
EIA-Form860
EIA-Form923
EPA-CEMS