USEPA / ElectricityLCI

Creative Commons Zero v1.0 Universal
24 stars 10 forks source link

Plant/Facility ID type mismatch between eGRID and 923 #42

Closed gschivley closed 5 years ago

gschivley commented 5 years ago

When trying to run lines in the build_model.py script I encountered the following exception:

>>> from electricitylci.globals import output_dir,model_name
>>> all_generation_db = electricitylci.get_generation_process_df(regions='all')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/greg/Documents/NETL work/ElectricityLCI/electricitylci/__init__.py", line 5, in get_generation_process_df
    generation_process_df = create_generation_process_df(electricity_for_selected_egrid_facilities,emissions_and_waste_for_selected_egrid_facilities,subregion=regions)
  File "/Users/greg/Documents/NETL work/ElectricityLCI/electricitylci/generation.py", line 42, in create_generation_process_df
    database_with_new_generation = combined_data.merge(EIA_923_gen_data, left_on = ['eGRID_ID'],right_on = ['Plant Id'],how = 'left')
  File "/Users/greg/miniconda3/envs/elci/lib/python3.7/site-packages/pandas/core/frame.py", line 6389, in merge
    copy=copy, indicator=indicator, validate=validate)
  File "/Users/greg/miniconda3/envs/elci/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 61, in merge
    validate=validate)
  File "/Users/greg/miniconda3/envs/elci/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 555, in __init__
    self._maybe_coerce_merge_keys()
  File "/Users/greg/miniconda3/envs/elci/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 986, in _maybe_coerce_merge_keys
    raise ValueError(msg)
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

Upon inspection it appears that the eGRID FacilityID values are strings and the EIA923 Plant Id values are integers.

WesIngwersen commented 5 years ago

@gschivley I do not get this error, but I agree this is a potential issue. According to stewi format specs, FacilityIDs are always a string. In electricitylci I tried to stick with this rule. In this case I just modified eia923_generation.py at ln 45. so that if the EIA923 csv file already exists on the users machine, it will read it in and set Plant Id to a string. If the file didn't exist, it was already set to a string from the excel read in ln 27.

gschivley commented 5 years ago

Good to know. I've always used EIA plant id/codes as integers but maybe they are better as strings.

Probably not worth the effort, but parameter dictionaries for reading each file type could be stored in a parameters file. Could even include the expected file path.

eia923_gen_fuel_excel_kwargs = dict(
    sheet_name='Page 1 Generation and Fuel Data',
    header=5,
    na_values=['.'],
    dtype={'Plant Id': str,
                  'YEAR': str}
)

eia923_gen_fuel_csv_kwargs = dict(
    dtype={'Plant Id': str,
                  'YEAR': str}
)

These would be used as kwargs when reading data files.

eia = pd.read_excel(eia923_path, **eia923_gen_fuel_excel_kwargs)