USEPA / ElectricityLCI

Creative Commons Zero v1.0 Universal
24 stars 10 forks source link

Missing eGRID subregion generation by fuel category reference data #211

Open dt-woods opened 7 months ago

dt-woods commented 7 months ago

The electricity baseline provides a user-defined configuration value for 'egrid_year', which triggers the data file, '~/electricitylci/data/egrid_subregion_generation_by_fuelcategoryreference[year].csv' to be accessed in 'egrid_energy.py' (referenced in 'generation_mix.py').

The ElectrictyLCI only provides two CSV files: 2014 and 2016. See electricitylci/data/egrid_subregion_generation_by_fuelcategory_reference_2016.csv.

This means that ELCI_2 configuration model is unsupported. This also means that future baselines are hindered by the lack of this data file.

In order to support the current development and future baselines, a little more transparency is needed regarding the following:

dt-woods commented 7 months ago

https://github.com/USEPA/ElectricityLCI/blob/master/electricitylci/data/egrid_subregion_generation_by_fuelcategory_reference_2016.csv

dt-woods commented 7 months ago

Note that it does not appear that StEWI has the facility generation data from eGRID. Tried the various formats with "getInventory," but failed to find 'Electricity' data.

Found it here:

bl-young commented 7 months ago

stewi.getInventory('eGRID', year, stewiformat='flowbyfacility') will return a dataframe that includes emissions and Electricity output as a flow.

Note also that stewi.getInventoryFacilities('eGRID', year) includes the fuel type by facility.

My guess is some combination of these generated the files originally but I do not know.

dt-woods commented 7 months ago

Example code:

import os

import pandas as pd

from stewi import getInventoryFacilities 
from stewi import getInventory 

def make_egrid_subregion_ref(year):
    """Generate the 'egrid_subregion_generation_inventory_reference' CSV data
    file for a given year (if it does not already exist).

    Parameters
    ----------
    year : ing
        Data year.
    """
    # Define the output file, which should be in data directory of package.
    ref_name = "egrid_subregion_generation_by_fuelcategory_reference_%s.csv" % year
    ref_path = os.path.join(data_dir, ref_name)

    if os.path.exists(ref_path):
        logging.info(
            "eGRID subregion generation inventory %s reference exists" % year)
    else:
        logging.info(
            "Creating eGRID subregion generation inventory "
            "%s reference CSV" % year)

        # Pull the inventory data from stewi.
        a = stewi.getInventory("eGRID", year)

        # Pull facility meta data from stewi.
        meta_cols = [
            'FacilityID',
            'eGRID subregion acronym',
            'Plant primary coal/oil/gas/ other fossil fuel category'
        ]
        b = stewi.getInventoryFacilities("eGRID", 2018)[meta_cols]

        # Merge two data frames together to get inventory + facility metadata.
        c = pd.merge(
            left=a.query("FlowName == 'Electricity'"),
            right=b,
            on="FacilityID",
        )

        # Group by and sum by FacilityID and FuelCategory to get total
        # electricity generation. Update column names to match existing
        # CSV files in the repo.
        c = c.groupby(
            by=[
                'eGRID subregion acronym',
                'Plant primary coal/oil/gas/ other fossil fuel category']
        )['FlowAmount'].agg('sum').reset_index()
        c = c.rename(columns={
                'eGRID subregion acronym': 'Subregion',
                'Plant primary coal/oil/gas/ other fossil fuel category': 'FuelCategory',
                'FlowAmount': 'Electricity'
        })
        # Convert Electricity from MJ to MWh; and order
        c['Electricity'] /= 3600.0
        c = c.sort_values(by=['FuelCategory', 'Subregion'])
        c.to_csv(ref_path, index=False)
dt-woods commented 7 months ago

^^^ The method above will be added to egrid_facilities.py to create the reference CSV when called in the global space of egrid_energy.py right before the file is accessed to avoid FileNotFound Error.

dt-woods commented 7 months ago

NOTE: I found no reference to either "egrid_subregion_totals_reference_2016.csv" or "egrid_subregion_totals_reference_2014.csv" so I omitted their creation.