USEPA / ElectricityLCI

Creative Commons Zero v1.0 Universal
24 stars 10 forks source link

EIA coalpublic2021.xls Excel file format cannot be determined #230

Open dt-woods opened 4 months ago

dt-woods commented 4 months ago

The URL used to pull the 2021 coal public Microsoft Excel workbook (http://www.eia.gov/coal/data/public/xls/coalpublic2021.xls) has a file format that does not match its extension. Examining the file, it appears to be an XML spreadsheet (perhaps mis-clicked in the "save as"?).

If you force load the file into Excel, it will open. Enabling editing, I fixed this by re-saving the XML spreadsheet as xls in the expected f7a_2021 folder. I am hopeful that this problem may go away; however, the vintage of the file format is a little concerning (pre-2003).

I see no need in writing a new handler for the pandas read_excel method found in generate_upstream_coal_map method in coal_upstream.py that throws the error (pasted below for posterity).

Model ELCI_2021 selected.
2024-02-22 15:55:44.450:INFO:model_config:_load_model_specs:Loading model specs
2024-02-22 15:55:44.458:INFO:model_config:check_model_specs:Checking model specs
2024-02-22 15:55:44.460:INFO:model_config:check_model_specs:Checks passed!
2024-02-22 15:55:44.462:INFO:model_config:build_model_class:Model Specs for ELCI_2021
2024-02-22 15:55:44.463:INFO:<string>:run_generation:get upstream process
2024-02-22 15:55:44.463:INFO:__init__:get_upstream_process_df:Generating upstream inventories...       
2024-02-22 15:55:44.464:INFO:coal_upstream:read_eia923_fuel_receipts:Loading data from previously downloaded excel file
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[2], line 1
----> 1 exec(open("electricitylci/main.py").read())

File <string>:247

File <string>:97, in main()

File <string>:177, in run_generation()

File ~\ElectricityLCI\electricitylci\__init__.py:453, in get_upstream_process_df(eia_gen_year)
    450 import electricitylci.combinator as combine
    452 logging.info("Generating upstream inventories...")
--> 453 coal_df = coal.generate_upstream_coal(eia_gen_year)
    454 ng_df = ng.generate_upstream_ng(eia_gen_year)
    455 petro_df = petro.generate_petroleum_upstream(eia_gen_year)

File ~\ElectricityLCI\electricitylci\coal_upstream.py:541, in generate_upstream_coal(year)
    517 """
    518 Generate the annual coal mining and transportation emissions (in kg) for
    519 each plant in EIA923.
   (...)
    538     minimerge.drop(
    539 """
    540 # Read the coal input from eia
--> 541 coal_input_eia = generate_upstream_coal_map(year)
    542 # Read coal transportation and mining data
    543 coal_transportation = pd.read_csv(
    544     os.path.join(data_dir, '2016_Coal_Trans_By_Plant_ABB_Data.csv')
    545 )

File ~\ElectricityLCI\electricitylci\coal_upstream.py:284, in generate_upstream_coal_map(year)
    279 else:
    280     eia7a_path = find_file_in_folder(
    281         folder_path=expected_7a_folder,
    282         file_pattern_match=['coalpublic'],
    283         return_name=False)
--> 284 eia7a_df = pd.read_excel(
    285     eia7a_path,
    286     sheet_name='Hist_Coal_Prod',
    287     skiprows=3
    288 )
    289 eia7a_df = _clean_columns(eia7a_df)
    290 coal_criteria = eia_fuel_receipts_df['fuel_group']=='Coal'

ValueError: Excel file format cannot be determined, you must specify an engine manually.
dt-woods commented 3 months ago

! The same error occurs with 2022 and the same manual fix was used to correct it!