Breakthrough-Energy / PreREISE

Generate input data for scenario framework
https://breakthrough-energy.github.io/docs/
MIT License
20 stars 28 forks source link

Bug report: EIA API for demand data is timing out #322

Open victoriahunt opened 1 year ago

victoriahunt commented 1 year ago

:beetle:

Bug summary

The EIA API is proposed to be used for demand data for the HIFLD Project as described in Issue #293 I am trying to retrieve data for the list of BAs created in issue #241 During the week of Oct 31 - Nov 4, I have been attempting to use the EIA API, with varying degrees of success due to it timing out at unpredictable intervals, while trying to retrieve data for that BA list. When it times out, it throws an error. It is also slow, taking >10 minutes per BA in some cases.

Code for reproduction

import getpass
import pandas as pd
import from prereise.gather.demanddata.eia.get_eia_data import get_ba_demand

start = pd.to_datetime('2016-01-01 00:00:00')
end = pd.to_datetime('2016-12-31 23:00:00')

key = getpass.getpass(prompt='api_key=')
 #BA shp list is list of BAs from 
get_ba_demand(ba_shplist, start, end, key)`

# #

Actual outcome

Screen Shot 2022-11-04 at 3 31 55 PM Other possible error I've observed: IncompleteRead: IncompleteRead(303104 bytes read)

# #

Expected outcome

What it looks like downloading data from the API: Screen Shot 2022-11-04 at 4 00 03 PM

Additional context

There are multiple warnings that may be contributing to these issues that appear on the API site as of Nov 4, 2022. There is a 'scheduled maintenance' on Nov 4, but the issue happened other days this week also. There is a likely more important warning "Notice: EIA will discontinue support for its legacy API (APIv1) in November, 2022. Excel add-in v1 sheets will continue to function as they are. Please refer to our documentation for the APIv2 interaction methods and our APIv2 query browser to view the data." This second warning may be contributing to the issues with the API and likely requires a long term fix. Screen Shot 2022-11-04 at 4 04 17 PM

BainanXia commented 1 year ago

Based on the discussion with @victoriahunt , the bug is induced by unknown issue of the current EPA API which make the download process unstable. According to the official notice, EPA APIv2 has been released and the support of the current API will be discontinued after Nov. 2022. Hence, we will need to update the download function in the code base to reflect the change. According to the documentation of APIv2, it seems all we need to do is update the URL from "http://api.eia.gov/series/?api_key=" to "http://api.eia.gov/v2/series/?api_key=". However, it hasn't been tested yet.

victoriahunt commented 1 year ago

@BainanXia Unfortunately I have some new evidence that simply updating URL doesn't work. I tried updating the API to the new url and I got the following error, running the eastern_demand_v5_demo notebook, otherwise unchanged on develop branch: Screen Shot 2022-11-08 at 10 23 54 AM

rouille commented 1 year ago

The documentation for the new API can be found here. The route has changed. It seems that demand data can be found through: https://api.eia.gov/v2/electricity/rto/region-sub-ba-data/?api_key=

It looks like only data from 06/15/2018 and later are available

Screen Shot 2022-11-08 at 1 12 41 PM
victoriahunt commented 1 year ago

Looping through the BAs one at a time works with a short 'sleep' pause as follows (using the old API URL).


for i in range(0,len(ba_shplist)):
    ba = ba_shplist[i]
    ba_shp_list = [ba]
    temp = get_ba_demand(ba_shp_list, start, end, key)
    listname.append(temp)
    time.sleep(20)

Note that it takes between one and two minutes on average per BA using this code on my device, so running through 120+ BAs takes more than 2 hours -- but it doesn't time out. Of course this may not work if/once the API URL support is dropped by EIA.