catalyst-cooperative / pudl-scrapers

Scrapers used to acquire snapshots of raw data inputs for versioned archiving and replicable analysis.
MIT License
3 stars 3 forks source link

Cache eia api v2 #22

Closed TrentonBush closed 2 years ago

TrentonBush commented 2 years ago

Draft implementation of EIA API downloader. There are a few warts and works in progress:

I'm wondering if it would be easier to use the bulk download and cut unneeded things than to fix this up. I think it would be more reliable, faster to download, faster to develop, and simpler, at the cost of some bandwidth and (temporary!) disk space.

zaneselvans commented 2 years ago

Welp, okay. An hour to download one small subset of the electricity data seems absurd. We should verify that the bulk download has the same information (hopefully they update it automatically...) and just use that. I was able to download the entire 192MB zipped JSON archive in 26 seconds in Mexico over a VPN so.... yeah. I guess we'll just archive that instead and subsequently extract the series we need. 🙄 But at least we'll have all of them! Assuming it's automatically updated and really contains whatever is inside the v2 API. Have you looked at its contents already?

zaneselvans commented 2 years ago

I've got the full eia_bulk_elec data source in the datastore now, so I think this can be closed.