Update packages - Githubissues

jenhagg commented 2 years ago

Purpose

Original goal was to bump pandas to be consistent with powersimdata. This required installing openpyxl which is now used to read excel files instead of xlrd (maybe this can be removed?), and updating the version of python-dateutil. There were also a bunch of warnings to fix, and 2 tests failing in test_decompose_profile.py. I fixed these by setting the index to plant_id, but that caused other tests in that file to fail (if done at the module level), so I could use some help to determine if that's fine, or there is a better approach.

What the code is doing

See above. Other changes are mainly to omit the code coverage locally while still having it run by github actions.

Testing

tox

Time estimate

10-20 min?

danielolsen commented 2 years ago

This required installing openpyxl which is now used to read excel files instead of xlrd (maybe this can be removed?)

It seems that we still use Excel-file-reading in two places:

Reading data from a copy of EIA Form 923 data in the repo (EIA923_Schedules_2_3_4_5_M_12_2016_Final_Revision.xlsx), used for generating hydro profiles (?)
Reading Excel files downloaded from the EIA as part of demand profile building (prereise.gather.demanddata.eia.get_eia_data).

I think both of these functions could be refactored inplace to perform the same functions without Excel, by converting the sheet(s) of EIA Form 923 that we care about into CSVs, and by grabbing EIA data in JSON format via their API rather than as Excel files, but we are also in the process of developing replacements to these anyway so I'm not sure whether it's worth it.

rouille commented 2 years ago

This required installing openpyxl which is now used to read excel files instead of xlrd (maybe this can be removed?)

It seems that we still use Excel-file-reading in two places:
* Reading data from a copy of EIA Form 923 data in the repo (`EIA923_Schedules_2_3_4_5_M_12_2016_Final_Revision.xlsx`), used for generating hydro profiles (?)

* Reading Excel files downloaded from the EIA as part of demand profile building (`prereise.gather.demanddata.eia.get_eia_data`).
I think both of these functions could be refactored inplace to perform the same functions without Excel, by converting the sheet(s) of EIA Form 923 that we care about into CSVs, and by grabbing EIA data in JSON format via their API rather than as Excel files, but we are also in the process of developing replacements to these anyway so I'm not sure whether it's worth it.

It seems that xlrd no longer allow to read anything but xls files. This means that the loading of xlsx files into data frame through pandas.read_excel will fail depending of the version of xlrd installed. We likely need refactor the code to use openpyxl which seems to be more flexible. See list of files below where pandas.read_excel is used:

prereise/gather/demanddata/eia/demo/eastern_demand_v6_demo/spp/split_spp_into_subareas.ipynb:    "county_list_owner = pd.read_excel('spp_counties_owners.xlsx', index_col=0)\n",
prereise/gather/demanddata/eia/demo/eastern_demand_v6_demo/miso/generate_miso_subarea_profile.ipynb:    "lrz_demand_2016 = pd.read_excel('20161231_dfal_hist.xls', skiprows=5)\n",
prereise/gather/demanddata/eia/demo/eastern_demand_v6_demo/miso/generate_miso_subarea_profile.ipynb:    "lrz_demand_2015 = pd.read_excel('20151231_dfal_hist.xls', skiprows=5)\n",
prereise/gather/demanddata/eia/get_eia_data.py:        df = pd.read_excel(
prereise/gather/helpers.py:    eia_form_923 = pd.read_excel(
prereise/gather/hydrodata/eia/demo/eastern_hydro_v3_demo/eastern_hydro_v3_demo.ipynb:    "eastern_hps = pd.read_excel(io='./hps_plants_eastern.xlsx',sheet_name = 'all_plantIDs',header = 0)\n",

jenhagg commented 2 years ago

We likely need refactor the code to use openpyxl which seems to be more flexible.

Seems like pandas will automatically use openpyxl if installed. One of the tests was actually failing and was fixed by doing this.

rouille commented 2 years ago

We likely need refactor the code to use openpyxl which seems to be more flexible.

Seems like pandas will automatically use openpyxl if installed. One of the tests was actually failing and was fixed by doing this.

Then, I would say we can remove xlrd from the requirements

Breakthrough-Energy / PreREISE

Update packages #275

Purpose

What the code is doing

Testing

Time estimate