USEPA / ElectricityLCI

Creative Commons Zero v1.0 Universal
26 stars 11 forks source link

Error in reduced schedule CSV file #259

Open dt-woods opened 1 month ago

dt-woods commented 1 month ago

In the read_eia923_fuel_receipts method in coal_upstream.py, Page 5 of the EIA923 Excel workbook is saved to CSV. The header columns include a new line character "\n" for the following coal min columns:

When written to CSV, these headers are all truncated to "Coalmine," dropping the context after the newline. This results in a CSV file with four columns all of the same name and causes errors with merging.

https://github.com/USEPA/ElectricityLCI/blob/e56268132f7607ead58a33bb5bdd525563a784f5/electricitylci/coal_upstream.py#L90

To fix, consider running the data frame through the _clean_columns method before writing to CSV.

A symptom of this is a 'KeyError' on key 'fuel_group', accessed in generate_upstream_coal_map from the data frame returned by read_eia_fuel_receipts.

dt-woods commented 1 month ago

The worksheet:

Screenshot 2024-10-09 at 12 56 49

And the reduced CSV:

Screenshot 2024-10-09 at 12 57 03
dt-woods commented 1 month ago

Note: to implement this fix on your machine, you need to delete any CSV files in the f923_YEAR folders in your data directory.