catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

EIA entity harvesting function fails when no EIA tables are being loaded #289

Closed zaneselvans closed 5 years ago

zaneselvans commented 5 years ago

Describe the bug When no EIA tables are being loaded into PUDL (say, when it's just FERC Form 1 tables), the EIA entity harvesting process fails and halts the ETL.

To Reproduce On a system with an already initialized FERC Form 1 database, this minimal attempt to initialize a PUDL DB will fail in the _harvesting() function, when it attempts to concatenate together several (non-existent) dataframes:

pudl.init.init_db(
    ferc1_tables = ['fuel_ferc1', 'plants_steam_ferc1'],
    ferc1_years = [2017, ]
    pudl_testing = True,
    ferc1_testing = False,
)

Expected behavior When no EIA data is being loaded into the DB, ETL should still succeed, but the entity tables should not get populated.

cmgosnell commented 5 years ago

This should be pretty easy to fix... I'll get on it.

both the puld.init._ETL_cems() and puld.init._ETL_ferc1() have an exit if there is not years/states/tables:

    if not states or not epacems_years:
        logger.info('Not ingesting EPA CEMS.')
        return None

EIA has these off ramps in their extract and transform process, but I didn't add this off ramp into pudl.transform.eia.main().