PyPSA / pypsa-usa

PyPSA-USA: An Open-Source Energy System Optimization Model for the United States
https://pypsa-usa.readthedocs.io
MIT License
34 stars 14 forks source link

Refactor `build_demand.py` to use EIA930 data from PUDL #313

Open jpvelez opened 1 month ago

jpvelez commented 1 month ago

Feature Request

Catalyst Coop is currently in the midst of adding EIA930 data to PUDL.

They just produced their first nightly build that includes (relatively unprocessed) hourly EIA-930 balancing authority operations data, including the following 4 tables:

Docs are here, notebook poking at the data here.

Once the tables graduate to out_eia930_*, meaning they have been fully cleaned up and are production ready, we'll need to refactor build_demand to pull EIA930 data from PUDL.

Suggested Solution

ktehranchi commented 1 month ago

So one way to handle this to reduce duplicate work from this ticket and #314 would be to write a _helper.py script function that can read the PUDL sql db file to extract the EIA930 into a pd.Dataframe with the current format we currently import into both plot_validation_figures and in build_demand.

plot_validation_figures queries the generation & imports/exports data columns of the 930 whereas build_demand queries the demand only.

zaneselvans commented 1 month ago

Our first round of work with GridLab is pretty much done now, and Ana and Elaine wanted the relatively unprocessed EIA-930, so I don't have a timeline for when a more cleaned up set of EIA-930 output tables will be available. Though if someone wanted to adapt one of the existing modules from Jacques at Stanford or the Open Grid Emissions project into a more processed EIA-930 table in PUDL, that would be useful. There's a lot of stuff going on in the EIA-930 that makes it complicated to use as-is depending on what you need. We could also try depending directly on https://github.com/jdechalendar/gridemissions/

Recently we got to the point where the number of multi-million row hourly tables was just too much for SQLite to be convenient, so now all the hourly outputs are only in Parquet. The best place to pull bulk PUDL outputs from is the PUDL AWS Open Data Registry S3 bucket: s3://pudl.catalyst.coop/ We haven't done a release with the new data from the GridLab project yet so you'll only find them in /nightly for now.

ktehranchi commented 1 month ago

Good to know- I think for our purposes it is useful to have Jacques (GridEmissions) physics-based reconciliation of EIA930 data, just to make sure some basics (like supply=demand) hold true in the data.

We are already pulling that data directly from his was s3 bucket. Unless @jpvelez you wanted to pull work from jacques's GridEmissions tool into PUDL... i think we can table this ticket for now.