Closed zaneselvans closed 5 years ago
The problem appears to be coming from the call to cleanstrings
that is cleaning up the messy Form 1 fuel types. The bug is only exhibited when the raw input from FERC Form 1 excludes records that have fuel_quantity <= 0
AND when the strings are remapped with all of coal, oil, gas, nuclear, waste and other being part of the simplified taxonomy of fuel types. Excluding any one of those types from the remapping results in the call to cleanstrings succeeding. WTF?
A minimal set of code to reproduce from within a notebook...
import pandas as pd
import sys
import os
sys.path.append(os.path.abspath(os.path.join('..','..','..')))
import pudl
fuel_strings = {
'coal': pudl.constants.ferc1_coal_strings,
'oil': pudl.constants.ferc1_oil_strings,
'gas': pudl.constants.ferc1_gas_strings,
'nuclear': pudl.constants.ferc1_nuke_strings,
'waste': pudl.constants.ferc1_waste_strings,
'other': pudl.constants.ferc1_other_strings
}
def extract_fuel(bork=False):
fuel_raw = pd.read_sql("SELECT * FROM f1_fuel WHERE report_year=2016", pudl.extract.ferc1.connect_db())
fuel_raw = fuel_raw[(fuel_raw.fuel != '') & (fuel_raw.plant_name != '')]
if bork:
fuel_raw = fuel_raw[fuel_raw.fuel_quantity > 0]
return fuel_raw
def transform_fuel(fuel_df):
fuel_df.fuel = pudl.transform.pudl.cleanstrings(fuel_df.fuel, fuel_strings, unmapped='')
return fuel_df
fuel_raw = extract_fuel(bork=True)
fuel_clean = transform_fuel(fuel_raw)
Describe the bug When initializing the PUDL database, if the only FERC Form 1 year included is 2016, the import fails in
pudl.transform.ferc1.fuel()
with an infinite recursion. When more than 1 year of data is used to initialize the database, alongside 2016, the process completes successfully.To Reproduce
Expected behavior PUDL should be able to import any individual year of data from any of the data sources in isolation.