Scaling Water Use - Githubissues

dt-woods commented 1 year ago

In method generate_plant_water_use in module plant_water_use.py, there's 708,000,000 MWh that are unaccounted for, which represents 1.59e+13 liters of withdrawal and 1.52e+13 liters of discharge due to the groupby method including "Water Type" field, which has 6,787 rows with NaN as the water type (see Reference).

This seems relevant given that this method assumes linearity between net generation and water quantities (i.e., consumption and withdrawal). Given the unaccounted for generation, the "fraction_gen" that's calculated may be misleading.

Two possible options that I see:

Fill na values in Water Type with "Other" catch-all category
Given that the EIA generation and the "water" generation (source unknown for DATA_FILE) are the same for most facilities, it seems reasonable to skip the intermediate scaling and simply scale based on EIA. The only caveats to this are where EIA generation is negative (e.g., facilities 1077, 1233, 1404), which I believe is evidence against the linear scaling of water withdrawal to generation.

Reference: https://github.com/USEPA/ElectricityLCI/blob/2232c41f2cb4fd333ad59c8710aa55906e6a7ed3/electricitylci/plant_water_use.py#L104

Reproducible code:

import pandas as pd

from electricitylci.globals import output_dir, data_dir
import electricitylci.PhysicalQuantities as pq

DATA_FILE = "NETL-EIA_powerplants_water_withdraw_consume_data_2016.csv"
water_df = pd.read_csv(
    f"{data_dir}/{DATA_FILE}", index_col=0, low_memory=False
)
water_df["annual_withdrawal"] = (
    water_df["Water Withdrawal Intensity Adjusted (gal/MWh)"]
    * water_df["Total net generation (MWh)"]
    * pq.convert(1, "galUS", "l")
)
water_df["annual_discharge"] = water_df["annual_withdrawal"] - (
    water_df["Water Consumption Intensity Adjusted (gal/MWh)"]
    * water_df["Total net generation (MWh)"]
    * pq.convert(1, "galUS", "l")
)
nan_cols = water_df['Water Type'].isna()
missed_g = water_df[nan_cols]['Total net generation (MWh)'].sum()
missed_w = water_df[nan_cols]['annual_withdrawal'].sum()
missed_d = water_df[nan_cols]['annual_discharge'].sum()
missed_s = water_df[nan_cols]['State_y'].drop_duplicates().count()
missed_p = water_df[nan_cols]['Plant Code'].drop_duplicates().count()
print(
    "Missed generation:\t%0.3e MWh\n"
    "Missed withdrawal:\t%0.3e L\n"
    "Missed discharge:\t%0.3e L\n"
    "from %d plants across %d states." % (
        missed_g, missed_w, missed_d, missed_p, missed_s))

Generated output:

Missed generation:  7.083e+08 MWh
Missed withdrawal:  1.596e+13 L
Missed discharge:   1.519e+13 L
from 129 plants across 37 states.

dt-woods commented 1 year ago

For posterity, I believe the missing source for DATA_FILE comes from EIA's Thermoelectric cooling water data, here: https://www.eia.gov/electricity/data/water/

dt-woods commented 1 year ago

It looks like the culprit is actually from the EIA cooling detail workbook (see link above). Water types and water amounts are not matched for 132 plants where cooling ID is PLANT. Since water amounts were more important, they were taken without their associated water types. This simply needs to be scrubbed for each plant, year, month, generator, and boiler. This search should give two values for water type (empty and actual); see images below. A quick search shows several are of type "Fresh"; therefore, the recommendation to label as "Other" is unwise (since only Fresh water resources are analyzed in AWARE-US).

eia_plant_126_water_type — Figure 1. Where cooling ID is numeric, water type is available, but consumption and withdrawal are missing.

eia_plant_126_no_water_type — Figure 2. Where cooling ID is PLANT, water type is missing, but consumption and withdrawal values are present.

dt-woods commented 1 year ago

Attached is a proposed solution. fix_py.txt

dt-woods commented 1 year ago

Apologies, here's the missing download file function:

def download_file(url, filename):
    """Helper function to download a file from the internet.

    Parameters
    ----------
    url : str
        The URL for a file stored somewhere on the internet.
    filename: str
        A path and file name to save a local copy of the web-based file.

    Returns
    -------
    None. Check for file existence after running this function.

    Raises
    ------
    IOError : If a downloaded file fails to download.
    """
    import urllib.request
    urllib.request.urlretrieve(url, filename)

USEPA / ElectricityLCI

Scaling Water Use #197