catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

ETL Pipeline Issue - Zenodo Download Failing #1868

Closed kylebd99 closed 1 year ago

kylebd99 commented 1 year ago

Describe the bug

In the second step of the ETL pipeline "pudl_etl settings/etl_full.yml", there is an error which causes it to crash from a read timeout. Specifically, it occurs when it tries to download from "https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1997-mo.zip". Putting this into a browser indicate that this file doesn't exist in zenodo ("The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again."), so maybe the file structure was changed recently without the ETL being updated?

The full log from the command is as follows:

2022-08-25 10:19:44 [    INFO] pudl.extract.ferc1:712 Converting extracted FERC Form 1 table fuel_ferc1 into a pandas DataFrame.
2022-08-25 10:19:45 [    INFO] pudl.extract.ferc1:712 Converting extracted FERC Form 1 table plant_in_service_ferc1 into a pandas DataFrame.
2022-08-25 10:19:47 [    INFO] pudl.extract.ferc1:712 Converting extracted FERC Form 1 table plants_hydro_ferc1 into a pandas DataFrame.
2022-08-25 10:19:48 [    INFO] pudl.extract.ferc1:712 Converting extracted FERC Form 1 table plants_pumped_storage_ferc1 into a pandas DataFrame.
2022-08-25 10:19:48 [    INFO] pudl.extract.ferc1:712 Converting extracted FERC Form 1 table plants_small_ferc1 into a pandas DataFrame.
2022-08-25 10:19:48 [    INFO] pudl.extract.ferc1:712 Converting extracted FERC Form 1 table plants_steam_ferc1 into a pandas DataFrame.
2022-08-25 10:19:50 [    INFO] pudl.extract.ferc1:712 Converting extracted FERC Form 1 table purchased_power_ferc1 into a pandas DataFrame.
2022-08-25 10:19:52 [    INFO] pudl.transform.ferc1:2642 Transforming raw FERC Form 1 dataframe for loading into fuel_ferc1
2022-08-25 10:19:53 [    INFO] pudl.transform.ferc1:2642 Transforming raw FERC Form 1 dataframe for loading into plants_steam_ferc1
2022-08-25 10:19:54 [    INFO] pudl.transform.ferc1:1777 Identifying distinct large FERC plants for ID assignment.
/home/kylebd99/anaconda3/envs/pudl-dev/lib/python3.10/site-packages/sklearn/utils/validation.py:1858: FutureWarning: Feature names only support names that are all strings. Got feature names with dtypes: ['quoted_name', 'str']. An error will be raised in 1.2.
  warnings.warn(
2022-08-25 10:25:13 [    INFO] pudl.transform.ferc1:1809 Successfully associated 21510 of 28380 (75.79%) FERC Form 1 plant records with multi-year plant entities.
2022-08-25 10:25:13 [    INFO] pudl.transform.ferc1:1822 Assigning IDs to multi-year FERC plant entities.
2022-08-25 10:25:22 [    INFO] pudl.transform.ferc1:1839 Identified 4385 orphaned FERC plant records. Adding orphans to list of plant entities.
2022-08-25 10:25:27 [    INFO] pudl.transform.ferc1:1861 Successfully Identified 1938 multi-year plant entities.
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=1998 2 times in plant_id_ferc1=203
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=1994 2 times in plant_id_ferc1=308
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=1995 2 times in plant_id_ferc1=308
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=1996 2 times in plant_id_ferc1=308
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=1997 2 times in plant_id_ferc1=308
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=1998 2 times in plant_id_ferc1=308
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=1999 2 times in plant_id_ferc1=308
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2000 2 times in plant_id_ferc1=308
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2001 2 times in plant_id_ferc1=308
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2002 2 times in plant_id_ferc1=308
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2003 2 times in plant_id_ferc1=308
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2004 2 times in plant_id_ferc1=308
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2005 2 times in plant_id_ferc1=308
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2006 2 times in plant_id_ferc1=308
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2007 2 times in plant_id_ferc1=308
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2008 2 times in plant_id_ferc1=308
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2009 2 times in plant_id_ferc1=308
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=1995 2 times in plant_id_ferc1=368
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=1996 2 times in plant_id_ferc1=368
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=1997 2 times in plant_id_ferc1=368
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=1998 2 times in plant_id_ferc1=368
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2008 2 times in plant_id_ferc1=688
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2005 2 times in plant_id_ferc1=876
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2006 2 times in plant_id_ferc1=876
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2007 2 times in plant_id_ferc1=876
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2008 2 times in plant_id_ferc1=876
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2009 2 times in plant_id_ferc1=876
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2010 2 times in plant_id_ferc1=876
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2011 2 times in plant_id_ferc1=876
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2012 2 times in plant_id_ferc1=876
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2013 2 times in plant_id_ferc1=876
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2014 2 times in plant_id_ferc1=876
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2015 2 times in plant_id_ferc1=876
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2016 2 times in plant_id_ferc1=876
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2017 2 times in plant_id_ferc1=876
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2018 2 times in plant_id_ferc1=876
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2019 2 times in plant_id_ferc1=876
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2020 2 times in plant_id_ferc1=876
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=1995 2 times in plant_id_ferc1=1063
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2000 2 times in plant_id_ferc1=1153
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2019 2 times in plant_id_ferc1=1206
2022-08-25 10:25:39 [   ERROR] pudl.transform.ferc1:1931 Found report_year=2020 2 times in plant_id_ferc1=1206
2022-08-25 10:25:39 [    INFO] pudl.transform.ferc1:2642 Transforming raw FERC Form 1 dataframe for loading into plants_small_ferc1
/home/kylebd99/anaconda3/envs/pudl-dev/lib/python3.10/site-packages/openpyxl/worksheet/_reader.py:312: UserWarning: Unknown extension is not supported and will be removed
  warn(msg)
2022-08-25 10:25:40 [    INFO] pudl.transform.ferc1:2642 Transforming raw FERC Form 1 dataframe for loading into plants_hydro_ferc1
2022-08-25 10:25:41 [    INFO] pudl.transform.ferc1:2642 Transforming raw FERC Form 1 dataframe for loading into plants_pumped_storage_ferc1
2022-08-25 10:25:41 [    INFO] pudl.transform.ferc1:2642 Transforming raw FERC Form 1 dataframe for loading into plant_in_service_ferc1
2022-08-25 10:25:41 [    INFO] pudl.transform.ferc1:1454 0.0233% of unpacked records were duplicates, and discarded.
2022-08-25 10:25:41 [    INFO] pudl.transform.ferc1:1495 Col: begin_yr_bal, Cat: starting_balance
2022-08-25 10:25:41 [    INFO] pudl.transform.ferc1:1495 Col: addition, Cat: additions
2022-08-25 10:25:41 [    INFO] pudl.transform.ferc1:1495 Col: retirements, Cat: retirements
2022-08-25 10:25:41 [    INFO] pudl.transform.ferc1:1495 Col: adjustments, Cat: adjustments
2022-08-25 10:25:41 [    INFO] pudl.transform.ferc1:1495 Col: transfers, Cat: transfers
2022-08-25 10:25:41 [    INFO] pudl.transform.ferc1:1495 Col: yr_end_bal, Cat: ending_balance
2022-08-25 10:25:41 [    INFO] pudl.transform.ferc1:2642 Transforming raw FERC Form 1 dataframe for loading into purchased_power_ferc1
2022-08-25 10:25:42 [ WARNING] pudl.transform.ferc1:1582 7 duplicate record_id values found in pre-transform table f1_purchased_pwr: ['f1_purchased_pwr_1998_12_238_0_1' 'f1_purchased_pwr_1998_12_238_0_2'
 'f1_purchased_pwr_1998_12_238_0_3' 'f1_purchased_pwr_1998_12_238_0_15'
 'f1_purchased_pwr_1998_12_238_0_4' 'f1_purchased_pwr_1998_12_238_0_5'
 'f1_purchased_pwr_2000_12_148_6_5'].
2022-08-25 10:25:42 [    INFO] pudl.metadata.classes:1579 Recoding purchased_power_ferc1.purchase_type_code
2022-08-25 10:25:43 [    INFO] pudl.extract.excel:214 Extracting eia923 spreadsheet data.
2022-08-25 10:30:32 [    INFO] pudl.extract.excel:214 Extracting eia860 spreadsheet data.
2022-08-25 10:30:40 [ WARNING] pudl.extract.excel:260 Extra columns found in page boiler_generator_assn: {'steam_plant_type', 'utility_name', 'plant_name', 'generator_association'}
2022-08-25 10:31:12 [ WARNING] pudl.extract.excel:260 Extra columns found in page generator: {'fercother', 'winter_capacity', 'fercdock', 'ferccogen', 'planned_derates_net_summer_cap', 'summer_capacity', 'fercewgdoc'}
2022-08-25 10:33:29 [ WARNING] pudl.extract.excel:260 Extra columns found in page generator_proposed: {'winter_capacity', 'winter_estimated_capacity', 'summer_estimated_capacity', 'summer_capacity'}
2022-08-25 10:34:33 [ WARNING] pudl.extract.excel:260 Extra columns found in page plant: {'ownertransdist', 'ferc_exempt_wholesale_generator_docket_number'}
2022-08-25 10:34:47 [ WARNING] pudl.extract.excel:260 Extra columns found in page utility: {'areacode'}
2022-08-25 10:34:47 [    INFO] pudl.extract.excel:214 Extracting eia860m spreadsheet data.
2022-08-25 10:34:57 [    INFO] pudl.transform.eia860:612 Transforming raw EIA 860 DataFrames for ownership_eia860 concatenated across all years.
/home/kylebd99/pudl/src/pudl/transform/eia860.py:77: PerformanceWarning: indexing past lexsort depth may impact performance.
  known_dupes = own_df.set_index(["plant_id_eia", "generator_id"]).loc[(56032, "1")]
2022-08-25 10:35:09 [    INFO] pudl.transform.eia860:612 Transforming raw EIA 860 DataFrames for generators_eia860 concatenated across all years.
2022-08-25 10:36:30 [    INFO] pudl.metadata.classes:1579 Recoding generators_eia860.prime_mover_code
2022-08-25 10:36:30 [    INFO] pudl.metadata.classes:1579 Recoding generators_eia860.energy_source_code_1
2022-08-25 10:36:31 [    INFO] pudl.metadata.classes:1579 Recoding generators_eia860.energy_source_code_2
2022-08-25 10:36:31 [    INFO] pudl.metadata.classes:1579 Recoding generators_eia860.energy_source_code_3
2022-08-25 10:36:31 [    INFO] pudl.metadata.classes:1579 Recoding generators_eia860.energy_source_code_4
2022-08-25 10:36:32 [    INFO] pudl.metadata.classes:1579 Recoding generators_eia860.energy_source_code_5
2022-08-25 10:36:32 [    INFO] pudl.metadata.classes:1579 Recoding generators_eia860.energy_source_code_6
2022-08-25 10:36:33 [    INFO] pudl.metadata.classes:1579 Recoding generators_eia860.energy_source_1_transport_1
2022-08-25 10:36:33 [    INFO] pudl.metadata.classes:1579 Recoding generators_eia860.energy_source_1_transport_2
2022-08-25 10:36:33 [    INFO] pudl.metadata.classes:1579 Recoding generators_eia860.energy_source_1_transport_3
2022-08-25 10:36:34 [    INFO] pudl.metadata.classes:1579 Recoding generators_eia860.energy_source_2_transport_1
2022-08-25 10:36:34 [    INFO] pudl.metadata.classes:1579 Recoding generators_eia860.energy_source_2_transport_2
2022-08-25 10:36:35 [    INFO] pudl.metadata.classes:1579 Recoding generators_eia860.energy_source_2_transport_3
2022-08-25 10:36:35 [    INFO] pudl.metadata.classes:1579 Recoding generators_eia860.planned_new_prime_mover_code
2022-08-25 10:36:35 [    INFO] pudl.metadata.classes:1579 Recoding generators_eia860.planned_energy_source_code_1
2022-08-25 10:36:36 [    INFO] pudl.metadata.classes:1579 Recoding generators_eia860.startup_source_code_1
2022-08-25 10:36:36 [    INFO] pudl.metadata.classes:1579 Recoding generators_eia860.startup_source_code_2
2022-08-25 10:36:37 [    INFO] pudl.metadata.classes:1579 Recoding generators_eia860.startup_source_code_3
2022-08-25 10:36:37 [    INFO] pudl.metadata.classes:1579 Recoding generators_eia860.startup_source_code_4
2022-08-25 10:36:38 [    INFO] pudl.transform.eia860:612 Transforming raw EIA 860 DataFrames for plants_eia860 concatenated across all years.
2022-08-25 10:36:46 [    INFO] pudl.transform.eia860:612 Transforming raw EIA 860 DataFrames for boiler_generator_assn_eia860 concatenated across all years.
2022-08-25 10:36:46 [    INFO] pudl.transform.eia860:612 Transforming raw EIA 860 DataFrames for utilities_eia860 concatenated across all years.
2022-08-25 10:36:48 [    INFO] pudl.transform.eia923:1234 Transforming raw EIA 923 DataFrames for generation_fuel_eia923 concatenated across all years.
2022-08-25 10:37:23 [    INFO] pudl.metadata.classes:1579 Recoding generation_fuel_eia923.energy_source_code
2022-08-25 10:37:24 [    INFO] pudl.metadata.classes:1579 Recoding generation_fuel_eia923.fuel_type_code_aer
2022-08-25 10:37:24 [    INFO] pudl.metadata.classes:1579 Recoding generation_fuel_eia923.prime_mover_code
2022-08-25 10:37:31 [    INFO] pudl.transform.eia923:1234 Transforming raw EIA 923 DataFrames for boiler_fuel_eia923 concatenated across all years.
2022-08-25 10:37:42 [    INFO] pudl.metadata.classes:1579 Recoding boiler_fuel_eia923.energy_source_code
2022-08-25 10:37:43 [    INFO] pudl.transform.eia923:1234 Transforming raw EIA 923 DataFrames for generation_eia923 concatenated across all years.
2022-08-25 10:37:46 [    INFO] pudl.transform.eia923:1234 Transforming raw EIA 923 DataFrames for coalmine_eia923 concatenated across all years.
2022-08-25 10:37:55 [    INFO] pudl.helpers:204 Assigned state FIPS codes for 36.68% of records.
2022-08-25 10:37:55 [    INFO] pudl.metadata.classes:1579 Recoding coalmine_eia923.mine_type_code
2022-08-25 10:37:56 [    INFO] pudl.metadata.classes:1579 Recoding coalmine_eia923.mine_type_code
2022-08-25 10:37:56 [    INFO] pudl.transform.eia923:1234 Transforming raw EIA 923 DataFrames for fuel_receipts_costs_eia923 concatenated across all years.
2022-08-25 10:38:04 [    INFO] pudl.helpers:204 Assigned state FIPS codes for 36.68% of records.
2022-08-25 10:38:05 [    INFO] pudl.metadata.classes:1579 Recoding coalmine_eia923.mine_type_code
2022-08-25 10:38:19 [    INFO] pudl.metadata.classes:1579 Recoding fuel_receipts_costs_eia923.contract_type_code
2022-08-25 10:38:19 [    INFO] pudl.metadata.classes:1579 Recoding fuel_receipts_costs_eia923.energy_source_code
2022-08-25 10:38:19 [    INFO] pudl.metadata.classes:1579 Recoding fuel_receipts_costs_eia923.primary_transportation_mode_code
2022-08-25 10:38:19 [    INFO] pudl.metadata.classes:1579 Recoding fuel_receipts_costs_eia923.secondary_transportation_mode_code
2022-08-25 10:38:51 [    INFO] pudl.transform.eia:1104 Harvesting IDs & consistently static attributes for EIA plants
2022-08-25 10:39:56 [    INFO] pudl.transform.eia:629 Average consistency of static plants values is 99.19%
2022-08-25 10:39:57 [    INFO] pudl.transform.eia:1104 Harvesting IDs & consistently static attributes for EIA generators
2022-08-25 10:41:54 [    INFO] pudl.transform.eia:629 Average consistency of static generators values is 99.50%
2022-08-25 10:41:54 [    INFO] pudl.transform.eia:1104 Harvesting IDs & consistently static attributes for EIA utilities
2022-08-25 10:42:04 [    INFO] pudl.transform.eia:629 Average consistency of static utilities values is 99.98%
2022-08-25 10:42:04 [    INFO] pudl.transform.eia:1104 Harvesting IDs & consistently static attributes for EIA boilers
2022-08-25 10:42:07 [    INFO] pudl.transform.eia:629 Average consistency of static boilers values is 99.09%
2022-08-25 10:42:07 [    INFO] pudl.transform.eia:700 Inferring complete EIA boiler-generator associations.
2022-08-25 10:42:30 [ WARNING] pudl.transform.eia:1002 Multiple EIA unit codes:plant_id_eia=1004, unit_id_pudl=3, unit_id_eia=['G108' '1' 'CT1']
2022-08-25 10:42:30 [ WARNING] pudl.transform.eia:1002 Multiple EIA unit codes:plant_id_eia=1904, unit_id_pudl=1, unit_id_eia=['HBR0' 'BDS0']
2022-08-25 10:42:30 [ WARNING] pudl.transform.eia:1002 Multiple EIA unit codes:plant_id_eia=1927, unit_id_pudl=2, unit_id_eia=['HBR0' 'RIV0']
2022-08-25 10:42:30 [ WARNING] pudl.transform.eia:1002 Multiple EIA unit codes:plant_id_eia=4040, unit_id_pudl=1, unit_id_eia=['PWG1' 'PWG2']
2022-08-25 10:42:30 [ WARNING] pudl.transform.eia:1002 Multiple EIA unit codes:plant_id_eia=7242, unit_id_pudl=1, unit_id_eia=['CC1' 'CC2']
2022-08-25 10:42:30 [ WARNING] pudl.transform.eia:1002 Multiple EIA unit codes:plant_id_eia=7757, unit_id_pudl=1, unit_id_eia=['CC1' 'CC2']
2022-08-25 10:42:30 [ WARNING] pudl.transform.eia:1002 Multiple EIA unit codes:plant_id_eia=10725, unit_id_pudl=1, unit_id_eia=['F801' 'F802']
2022-08-25 10:42:30 [ WARNING] pudl.transform.eia:1002 Multiple EIA unit codes:plant_id_eia=50973, unit_id_pudl=1, unit_id_eia=['BLK1' 'BLK2' 'BLK3']
2022-08-25 10:42:30 [ WARNING] pudl.transform.eia:1002 Multiple EIA unit codes:plant_id_eia=55153, unit_id_pudl=1, unit_id_eia=['STG1' 'STG2']
2022-08-25 10:42:30 [ WARNING] pudl.transform.eia:1002 Multiple EIA unit codes:plant_id_eia=55309, unit_id_pudl=1, unit_id_eia=['SMR2' 'SMR1']
2022-08-25 10:42:30 [ WARNING] pudl.transform.eia:1002 Multiple EIA unit codes:plant_id_eia=55502, unit_id_pudl=1, unit_id_eia=['G801' 'CC1' 'CC2']
2022-08-25 10:42:30 [ WARNING] pudl.transform.eia:1002 Multiple EIA unit codes:plant_id_eia=55701, unit_id_pudl=1, unit_id_eia=['CC1' 'G961']
2022-08-25 10:42:30 [ WARNING] pudl.transform.eia:1002 Multiple EIA unit codes:plant_id_eia=56041, unit_id_pudl=1, unit_id_eia=['NGS' 'MGS']
2022-08-25 10:42:30 [ WARNING] pudl.transform.eia:1002 Multiple EIA unit codes:plant_id_eia=56309, unit_id_pudl=1, unit_id_eia=['G401' 'G402']
2022-08-25 10:42:30 [ WARNING] pudl.transform.eia:1002 Multiple EIA unit codes:plant_id_eia=56350, unit_id_pudl=1, unit_id_eia=['115' 'BLK1']
2022-08-25 10:42:30 [ WARNING] pudl.transform.eia:1002 Multiple EIA unit codes:plant_id_eia=56350, unit_id_pudl=2, unit_id_eia=['116' 'BLK2']
2022-08-25 10:42:30 [ WARNING] pudl.transform.eia:1002 Multiple EIA unit codes:plant_id_eia=56998, unit_id_pudl=1, unit_id_eia=['43' 'PB4']
2022-08-25 10:42:30 [ WARNING] pudl.transform.eia:1002 Multiple EIA unit codes:plant_id_eia=56998, unit_id_pudl=2, unit_id_eia=['53' 'PB5']
2022-08-25 10:42:30 [ WARNING] pudl.transform.eia:1002 Multiple EIA unit codes:plant_id_eia=57666, unit_id_pudl=1, unit_id_eia=['1' '2']
2022-08-25 10:42:30 [ WARNING] pudl.transform.eia:1002 Multiple EIA unit codes:plant_id_eia=57794, unit_id_pudl=1, unit_id_eia=['CC01' 'CC02']
2022-08-25 10:42:30 [ WARNING] pudl.transform.eia:1002 Multiple EIA unit codes:plant_id_eia=60786, unit_id_pudl=1, unit_id_eia=['4343' '4141']
2022-08-25 10:42:30 [    INFO] pudl.metadata.classes:1579 Recoding plants_entity_eia.sector_id_eia
2022-08-25 10:42:30 [    INFO] pudl.metadata.classes:1579 Recoding boilers_entity_eia.prime_mover_code
2022-08-25 10:42:35 [ WARNING] pudl.glue.ferc1_eia:770 FERC to EIA glue breaking in plants_eia. There are too many null fields. Check the mapping spreadhseet.
2022-08-25 10:42:35 [ WARNING] pudl.glue.ferc1_eia:770 FERC to EIA glue breaking in utilities_eia. There are too many null fields. Check the mapping spreadhseet.
2022-08-25 10:42:35 [    INFO] pudl.glue.eia_epacems:37 grabbing original crosswalk
2022-08-25 10:42:35 [    INFO] pudl.glue.eia_epacems:80 splitting crosswalk into three normalized tables
2022-08-25 10:42:43 [    INFO] pudl.load:72 Loading coalmine_types_eia into PUDL SQLite DB.
2022-08-25 10:42:43 [    INFO] pudl.load:72 Loading contract_types_eia into PUDL SQLite DB.
2022-08-25 10:42:43 [    INFO] pudl.load:72 Loading energy_sources_eia into PUDL SQLite DB.
2022-08-25 10:42:43 [    INFO] pudl.load:72 Loading ferc_accounts into PUDL SQLite DB.
2022-08-25 10:42:43 [    INFO] pudl.load:72 Loading ferc_depreciation_lines into PUDL SQLite DB.
2022-08-25 10:42:43 [    INFO] pudl.load:72 Loading fuel_transportation_modes_eia into PUDL SQLite DB.
2022-08-25 10:42:43 [    INFO] pudl.load:72 Loading fuel_types_aer_eia into PUDL SQLite DB.
2022-08-25 10:42:43 [    INFO] pudl.load:72 Loading plant_unit_epa into PUDL SQLite DB.
2022-08-25 10:42:43 [    INFO] pudl.load:72 Loading plants_pudl into PUDL SQLite DB.
2022-08-25 10:42:43 [    INFO] pudl.load:72 Loading power_purchase_types_ferc1 into PUDL SQLite DB.
2022-08-25 10:42:43 [    INFO] pudl.load:72 Loading prime_movers_eia into PUDL SQLite DB.
2022-08-25 10:42:43 [    INFO] pudl.load:72 Loading sector_consolidated_eia into PUDL SQLite DB.
2022-08-25 10:42:43 [    INFO] pudl.load:72 Loading utilities_entity_eia into PUDL SQLite DB.
2022-08-25 10:42:43 [    INFO] pudl.load:72 Loading utilities_pudl into PUDL SQLite DB.
2022-08-25 10:42:43 [    INFO] pudl.load:72 Loading coalmine_eia923 into PUDL SQLite DB.
2022-08-25 10:42:43 [    INFO] pudl.load:72 Loading plants_eia into PUDL SQLite DB.
2022-08-25 10:42:43 [    INFO] pudl.load:72 Loading plants_entity_eia into PUDL SQLite DB.
2022-08-25 10:42:45 [    INFO] pudl.load:72 Loading utilities_eia into PUDL SQLite DB.
2022-08-25 10:42:45 [    INFO] pudl.load:72 Loading utilities_eia860 into PUDL SQLite DB.
2022-08-25 10:42:48 [    INFO] pudl.load:72 Loading utilities_ferc1 into PUDL SQLite DB.
2022-08-25 10:42:48 [    INFO] pudl.load:72 Loading utility_plant_assn into PUDL SQLite DB.
2022-08-25 10:42:48 [    INFO] pudl.load:72 Loading assn_plant_id_eia_epa into PUDL SQLite DB.
2022-08-25 10:42:48 [    INFO] pudl.load:72 Loading boilers_entity_eia into PUDL SQLite DB.
2022-08-25 10:42:48 [    INFO] pudl.load:72 Loading fuel_receipts_costs_eia923 into PUDL SQLite DB.
2022-08-25 10:43:06 [    INFO] pudl.load:72 Loading generation_fuel_eia923 into PUDL SQLite DB.
2022-08-25 10:43:49 [    INFO] pudl.load:72 Loading generation_fuel_nuclear_eia923 into PUDL SQLite DB.
2022-08-25 10:43:49 [    INFO] pudl.load:72 Loading generators_entity_eia into PUDL SQLite DB.
2022-08-25 10:43:50 [    INFO] pudl.load:72 Loading plant_in_service_ferc1 into PUDL SQLite DB.
2022-08-25 10:43:52 [    INFO] pudl.load:72 Loading plants_eia860 into PUDL SQLite DB.
2022-08-25 10:43:59 [    INFO] pudl.load:72 Loading plants_ferc1 into PUDL SQLite DB.
2022-08-25 10:43:59 [    INFO] pudl.load:72 Loading purchased_power_ferc1 into PUDL SQLite DB.
2022-08-25 10:44:02 [    INFO] pudl.load:72 Loading assn_gen_eia_unit_epa into PUDL SQLite DB.
2022-08-25 10:44:02 [    INFO] pudl.load:72 Loading boiler_fuel_eia923 into PUDL SQLite DB.
2022-08-25 10:44:21 [    INFO] pudl.load:72 Loading fuel_ferc1 into PUDL SQLite DB.
2022-08-25 10:44:22 [    INFO] pudl.load:72 Loading generation_eia923 into PUDL SQLite DB.
2022-08-25 10:44:27 [    INFO] pudl.load:72 Loading generators_eia860 into PUDL SQLite DB.
2022-08-25 10:44:57 [    INFO] pudl.load:72 Loading plants_hydro_ferc1 into PUDL SQLite DB.
2022-08-25 10:44:57 [    INFO] pudl.load:72 Loading plants_pumped_storage_ferc1 into PUDL SQLite DB.
2022-08-25 10:44:57 [    INFO] pudl.load:72 Loading plants_small_ferc1 into PUDL SQLite DB.
2022-08-25 10:44:58 [    INFO] pudl.load:72 Loading plants_steam_ferc1 into PUDL SQLite DB.
2022-08-25 10:44:59 [    INFO] pudl.load:72 Loading boiler_generator_assn_eia860 into PUDL SQLite DB.
2022-08-25 10:45:00 [    INFO] pudl.load:72 Loading ownership_eia860 into PUDL SQLite DB.
2022-08-25 10:45:02 [    INFO] pudl.etl:289 EPA CEMS years with no EIA plant data: [1995, 1996, 1997, 1998, 1999, 2000] Some timezones may be estimated based on plant state.
2022-08-25 10:45:02 [    INFO] pudl.etl:294 Processing EPA CEMS data and writing it to Apache Parquet.
2022-08-25 10:45:02 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-AL
2022-08-25 10:45:06 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-AR
2022-08-25 10:45:06 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-AZ
2022-08-25 10:45:07 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-CA
2022-08-25 10:45:07 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-CO
2022-08-25 10:45:07 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-CT
2022-08-25 10:45:08 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-DC
2022-08-25 10:45:08 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-DE
2022-08-25 10:45:09 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-FL
2022-08-25 10:45:09 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-GA
2022-08-25 10:45:11 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-IA
2022-08-25 10:45:11 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-ID
2022-08-25 10:45:11 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-IL
2022-08-25 10:45:13 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-IN
2022-08-25 10:45:15 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-KS
2022-08-25 10:45:15 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-KY
2022-08-25 10:45:16 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-LA
2022-08-25 10:45:17 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-MA
2022-08-25 10:45:17 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-MD
2022-08-25 10:45:18 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-ME
2022-08-25 10:45:18 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-MI
2022-08-25 10:45:19 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-MN
2022-08-25 10:45:19 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-MO
2022-08-25 10:45:21 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-MS
2022-08-25 10:45:21 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-MT
2022-08-25 10:45:22 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-NC
2022-08-25 10:45:22 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-ND
2022-08-25 10:45:22 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-NE
2022-08-25 10:45:23 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-NH
2022-08-25 10:45:23 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-NJ
2022-08-25 10:45:24 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-NM
2022-08-25 10:45:24 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-NV
2022-08-25 10:45:24 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-NY
2022-08-25 10:45:25 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-OH
2022-08-25 10:45:28 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-OK
2022-08-25 10:45:28 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-OR
2022-08-25 10:45:29 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-PA
2022-08-25 10:45:30 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-RI
2022-08-25 10:45:30 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-SC
2022-08-25 10:45:31 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-SD
2022-08-25 10:45:31 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-TN
2022-08-25 10:45:32 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-TX
2022-08-25 10:45:32 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-UT
2022-08-25 10:45:33 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-VA
2022-08-25 10:45:33 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-VT
2022-08-25 10:45:33 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-WA
2022-08-25 10:45:34 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-WI
2022-08-25 10:45:35 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-WV
2022-08-25 10:45:36 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1995-WY
2022-08-25 10:45:37 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-AL
2022-08-25 10:45:37 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-AR
2022-08-25 10:45:38 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-AZ
2022-08-25 10:45:38 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-CA
2022-08-25 10:45:39 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-CO
2022-08-25 10:45:39 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-CT
2022-08-25 10:45:39 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-DC
2022-08-25 10:45:40 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-DE
2022-08-25 10:45:40 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-FL
2022-08-25 10:45:41 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-GA
2022-08-25 10:45:42 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-IA
2022-08-25 10:45:42 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-ID
2022-08-25 10:45:43 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-IL
2022-08-25 10:45:43 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/deposit/depositions/4660268 from zenodo
2022-08-25 10:45:51 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/datapackage.json from zenodo
2022-08-25 10:45:55 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-il.zip from zenodo
2022-08-25 10:46:00 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-IN
2022-08-25 10:46:00 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-in.zip from zenodo
2022-08-25 10:46:07 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-KS
2022-08-25 10:46:07 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-ks.zip from zenodo
2022-08-25 10:46:10 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-KY
2022-08-25 10:46:10 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-ky.zip from zenodo
2022-08-25 10:46:21 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-LA
2022-08-25 10:46:21 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-la.zip from zenodo
2022-08-25 10:46:22 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-MA
2022-08-25 10:46:22 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-ma.zip from zenodo
2022-08-25 10:46:24 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-MD
2022-08-25 10:46:24 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-md.zip from zenodo
2022-08-25 10:46:30 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-ME
2022-08-25 10:46:30 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-me.zip from zenodo
2022-08-25 10:46:32 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-MI
2022-08-25 10:46:32 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-mi.zip from zenodo
2022-08-25 10:46:37 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-MN
2022-08-25 10:46:37 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-mn.zip from zenodo
2022-08-25 10:46:48 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-MO
2022-08-25 10:46:48 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-mo.zip from zenodo
2022-08-25 10:47:09 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-MS
2022-08-25 10:47:09 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-ms.zip from zenodo
2022-08-25 10:47:18 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-MT
2022-08-25 10:47:18 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-mt.zip from zenodo
2022-08-25 10:47:19 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-NC
2022-08-25 10:47:19 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-nc.zip from zenodo
2022-08-25 10:47:21 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-ND
2022-08-25 10:47:21 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-nd.zip from zenodo
2022-08-25 10:47:23 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-NE
2022-08-25 10:47:23 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-ne.zip from zenodo
2022-08-25 10:47:25 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-NH
2022-08-25 10:47:25 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-nh.zip from zenodo
2022-08-25 10:47:31 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-NJ
2022-08-25 10:47:31 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-nj.zip from zenodo
2022-08-25 10:47:37 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-NM
2022-08-25 10:47:37 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-nm.zip from zenodo
2022-08-25 10:47:40 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-NV
2022-08-25 10:47:40 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-nv.zip from zenodo
2022-08-25 10:47:41 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-NY
2022-08-25 10:47:41 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-ny.zip from zenodo
2022-08-25 10:47:52 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-OH
2022-08-25 10:47:52 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-oh.zip from zenodo
2022-08-25 10:48:36 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-OK
2022-08-25 10:48:36 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-ok.zip from zenodo
2022-08-25 10:48:38 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-OR
2022-08-25 10:48:38 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-or.zip from zenodo
2022-08-25 10:48:39 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-PA
2022-08-25 10:48:39 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-pa.zip from zenodo
2022-08-25 10:48:55 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-RI
2022-08-25 10:48:55 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-ri.zip from zenodo
2022-08-25 10:48:57 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-SC
2022-08-25 10:48:57 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-sc.zip from zenodo
2022-08-25 10:48:58 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-SD
2022-08-25 10:48:58 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-sd.zip from zenodo
2022-08-25 10:49:00 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-TN
2022-08-25 10:49:00 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-tn.zip from zenodo
2022-08-25 10:49:11 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-TX
2022-08-25 10:49:11 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-tx.zip from zenodo
2022-08-25 10:49:12 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-UT
2022-08-25 10:49:12 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-ut.zip from zenodo
2022-08-25 10:49:14 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-VA
2022-08-25 10:49:14 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-va.zip from zenodo
2022-08-25 10:49:16 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-VT
2022-08-25 10:49:16 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-vt.zip from zenodo
2022-08-25 10:49:17 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-WA
2022-08-25 10:49:17 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-wa.zip from zenodo
2022-08-25 10:49:18 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-WI
2022-08-25 10:49:18 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-wi.zip from zenodo
2022-08-25 10:49:31 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-WV
2022-08-25 10:49:31 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-wv.zip from zenodo
2022-08-25 10:49:46 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1996-WY
2022-08-25 10:49:46 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1996-wy.zip from zenodo
2022-08-25 10:49:54 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1997-AL
2022-08-25 10:49:54 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1997-al.zip from zenodo
2022-08-25 10:50:12 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1997-AR
2022-08-25 10:50:12 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1997-ar.zip from zenodo
2022-08-25 10:50:18 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1997-AZ
2022-08-25 10:50:18 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1997-az.zip from zenodo
2022-08-25 10:50:30 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1997-CA
2022-08-25 10:50:30 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1997-ca.zip from zenodo
2022-08-25 10:50:51 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1997-CO
2022-08-25 10:50:51 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1997-co.zip from zenodo
2022-08-25 10:51:16 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1997-CT
2022-08-25 10:51:16 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1997-ct.zip from zenodo
2022-08-25 10:51:35 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1997-DC
2022-08-25 10:51:35 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1997-dc.zip from zenodo
2022-08-25 10:51:37 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1997-DE
2022-08-25 10:51:37 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1997-de.zip from zenodo
2022-08-25 10:51:48 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1997-FL
2022-08-25 10:51:48 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1997-fl.zip from zenodo
2022-08-25 10:54:15 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1997-GA
2022-08-25 10:54:15 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1997-ga.zip from zenodo
2022-08-25 10:54:33 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1997-IA
2022-08-25 10:54:33 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1997-ia.zip from zenodo
2022-08-25 10:54:46 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1997-ID
2022-08-25 10:54:46 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1997-id.zip from zenodo
2022-08-25 10:54:49 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1997-IL
2022-08-25 10:54:49 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1997-il.zip from zenodo
2022-08-25 10:55:20 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1997-IN
2022-08-25 10:55:20 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1997-in.zip from zenodo
2022-08-25 10:55:57 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1997-KS
2022-08-25 10:55:57 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1997-ks.zip from zenodo
2022-08-25 10:56:07 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1997-KY
2022-08-25 10:56:07 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1997-ky.zip from zenodo
2022-08-25 10:58:45 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1997-LA
2022-08-25 10:58:45 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1997-la.zip from zenodo
2022-08-25 10:59:11 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1997-MA
2022-08-25 10:59:11 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1997-ma.zip from zenodo
2022-08-25 10:59:34 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1997-MD
2022-08-25 10:59:34 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1997-md.zip from zenodo
2022-08-25 11:00:12 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1997-ME
2022-08-25 11:00:12 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1997-me.zip from zenodo
2022-08-25 11:00:18 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1997-MI
2022-08-25 11:00:18 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1997-mi.zip from zenodo
x2022-08-25 11:01:07 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1997-MN
2022-08-25 11:01:07 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1997-mn.zip from zenodo
2022-08-25 11:01:27 [    INFO] pudl.etl:326 Processing EPA CEMS hourly data for 1997-MO
2022-08-25 11:01:27 [    INFO] pudl.workspace.datastore:192 Retrieving https://zenodo.org/api/files/19847a4e-f9d1-4b7a-840a-69b88e751a0e/epacems-1997-mo.zip from zenodo
Traceback (most recent call last):
  File "/home/kylebd99/anaconda3/envs/pudl-dev/lib/python3.10/site-packages/urllib3/response.py", line 443, in _error_catcher
    yield
  File "/home/kylebd99/anaconda3/envs/pudl-dev/lib/python3.10/site-packages/urllib3/response.py", line 566, in read
    data = self._fp_read(amt) if not fp_closed else b""
  File "/home/kylebd99/anaconda3/envs/pudl-dev/lib/python3.10/site-packages/urllib3/response.py", line 532, in _fp_read
    return self._fp.read(amt) if amt is not None else self._fp.read()
  File "/home/kylebd99/anaconda3/envs/pudl-dev/lib/python3.10/http/client.py", line 465, in read
    s = self.fp.read(amt)
  File "/home/kylebd99/anaconda3/envs/pudl-dev/lib/python3.10/socket.py", line 705, in readinto
    return self._sock.recv_into(b)
  File "/home/kylebd99/anaconda3/envs/pudl-dev/lib/python3.10/ssl.py", line 1274, in recv_into
    return self.read(nbytes, buffer)
  File "/home/kylebd99/anaconda3/envs/pudl-dev/lib/python3.10/ssl.py", line 1130, in read
    return self._sslobj.read(len, buffer)
TimeoutError: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/kylebd99/anaconda3/envs/pudl-dev/lib/python3.10/site-packages/requests/models.py", line 816, in generate
    yield from self.raw.stream(chunk_size, decode_content=True)
  File "/home/kylebd99/anaconda3/envs/pudl-dev/lib/python3.10/site-packages/urllib3/response.py", line 627, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/home/kylebd99/anaconda3/envs/pudl-dev/lib/python3.10/site-packages/urllib3/response.py", line 565, in read
    with self._error_catcher():
  File "/home/kylebd99/anaconda3/envs/pudl-dev/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/kylebd99/anaconda3/envs/pudl-dev/lib/python3.10/site-packages/urllib3/response.py", line 448, in _error_catcher
    raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='zenodo.org', port=443): Read timed out.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/kylebd99/anaconda3/envs/pudl-dev/bin/pudl_etl", line 33, in <module>
    sys.exit(load_entry_point('catalystcoop.pudl', 'console_scripts', 'pudl_etl')())
  File "/home/kylebd99/pudl/src/pudl/cli.py", line 126, in main
    pudl.etl.etl(
  File "/home/kylebd99/pudl/src/pudl/etl.py", line 451, in etl
    etl_epacems(datasets["epacems"], pudl_settings, ds_kwargs)
  File "/home/kylebd99/pudl/src/pudl/etl.py", line 327, in etl_epacems
    df = pudl.extract.epacems.extract(year=year, state=state, ds=ds)
  File "/home/kylebd99/pudl/src/pudl/extract/epacems.py", line 152, in extract
    return ds.get_data_frame(partition).assign(year=year)
  File "/home/kylebd99/pudl/src/pudl/extract/epacems.py", line 107, in get_data_frame
    archive = self.datastore.get_zipfile_resource(
  File "/home/kylebd99/pudl/src/pudl/workspace/datastore.py", line 364, in get_zipfile_resource
    return zipfile.ZipFile(io.BytesIO(self.get_unique_resource(dataset, **filters)))
  File "/home/kylebd99/pudl/src/pudl/workspace/datastore.py", line 353, in get_unique_resource
    _, content = next(res)
  File "/home/kylebd99/pudl/src/pudl/workspace/datastore.py", line 341, in get_resources
    contents = self._zenodo_fetcher.get_resource(res)
  File "/home/kylebd99/pudl/src/pudl/workspace/datastore.py", line 243, in get_resource
    content = self._fetch_from_url(url).content
  File "/home/kylebd99/pudl/src/pudl/workspace/datastore.py", line 193, in _fetch_from_url
    response = self.http.get(
  File "/home/kylebd99/anaconda3/envs/pudl-dev/lib/python3.10/site-packages/requests/sessions.py", line 600, in get
    return self.request("GET", url, **kwargs)
  File "/home/kylebd99/anaconda3/envs/pudl-dev/lib/python3.10/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/kylebd99/anaconda3/envs/pudl-dev/lib/python3.10/site-packages/requests/sessions.py", line 745, in send
    r.content
  File "/home/kylebd99/anaconda3/envs/pudl-dev/lib/python3.10/site-packages/requests/models.py", line 899, in content
    self._content = b"".join(self.iter_content(CONTENT_CHUNK_SIZE)) or b""
  File "/home/kylebd99/anaconda3/envs/pudl-dev/lib/python3.10/site-packages/requests/models.py", line 822, in generate
    raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='zenodo.org', port=443): Read timed out.

Bug Severity

How badly is this bug affecting you? High: This bug is preventing me from using PUDL.

To Reproduce

This occurred while following the ETL steps from here after setting up the dev environment according to here. Specifically, it happens in the second step "pudl_etl settings/etl_full.yml" where the settings file was generated by pudl_setup.

Expected behavior

Hopefully, the ETL pipeline would run

Software Environment?

Additional context

Attached the settings file used (had to rename to .txt to attach). etl_full.txt

zaneselvans commented 1 year ago

Unfortunately this often fails, because there are 1300+ files and several GB of data in the archive. For this reason, we're mostly relying on the Zenodo archives as permanent "cold storage" for the original data, and using it to populate either a local or cloud storage cache, which we actually use to run the ETL. You can do that too, or you can use the already processed data if you don't have a particular need to run the ETL yourself.

To pull the raw data down from Zenodo to a local cache, you can do:

pudl_datastore --dataset epacems

This is still pulling the data from Zenodo, so it'll still be a bit flaky and slow. You may need to run it several times (but each time it will only download additional data it didn't already get). Once all of the years of data have been downloaded, when you run the ETL, it'll use the local cache.

Alternatively you can pull from our public google cloud storage cache. To do this you'll need to be on the dev branch, and you'll need to have a google cloud account set up for billing (they offer $300 of free credits when you set up a new account) because the publicly cached data is "requester pays" so we don't get hammered with a bunch of data egress fees if someone is automatically downloading the data. In North America the egress fees are like $0.25/GB so the entire CEMS dataset is around $1 to download this way (to Google, not us!). This method is fast and much more reliable.

You can either pre-download:

pudl_datastore --gcs-cache-path gs://zenodo-cache.catalyst.coop --dataset epacems

Or just tell the ETL that it should obtain its data this way directly (which will also cache it locally for future use):

pudl_etl --gcs-cache-path gs://zenodo-cache.catalyst.coop settings/etl_full.yml

If you'd like to just use the preprocessed data, you can access it through the PUDL Intake Data Catalog (again, try the dev branch...). This will also require setting up google cloud billing / authentication.

If you don't actually need the EPA CEMS data, you can remove it from / comment it out in the settings file and the ETL will run fine (and you can run just the CEMS part of the ETL later if you want to, against an existing PUDL DB. Or, you can access the PUDL and raw FERC Form 1 DBs from our Datasette deployment: https://data.catalyst.coop

zaneselvans commented 1 year ago

Also: the reason you get a "file not found" when you try to go to the API URL that's failing isn't that the file isn't there, it's that it's only accessible if you're authenticated with a Zenodo API key, which would have to be sent to the webserver in the request / headers.

kylebd99 commented 1 year ago

This worked great! I had to restart the pudl_datastore command a couple times like you said, but it definitely managed it eventually.