catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 105 forks source link

Initial transform of EIA930 tables #3505

Closed e-belfer closed 2 months ago

e-belfer commented 3 months ago

Create a basic set of core_eia930 assets sufficient for the GridLab MVP.

- [x] Switch to using April `eia930` DOI in datastore.
- [x] Add `core_eia930__hourly_balancing_authority_demand` asset
- [x] Add `core_eia930__hourly_balancing_authority_net_generation` asset
- [x] Add `core_eia930__hourly_balancing_authority_interchange` asset
- [x] Add `core_eia930__hourly_subregion_demand` asset
- [x] Add `core_eia930__assn_balancing_authority_subregion`
- [x] Add `core_eia930__assn_balancing_authority_region` asset
- [x] Standardize column names in interchange table column maps
- [x] Define schemas for each of these tables
- [x] Write EIA-930 tables to SQLite (~18min; adds 50% / 8GB to pudl.sqlite)
- [x] If SQLite output too slow/big, try Parquet only (90sec; 260MB)
- [x] DATA: Evaluate consistency of BA codes found in EIA860, EIA923, & EIA930
- [x] DOC: Are `generation`/`demand`/`interchange` really ~MW~? Or MWh?
- [x] DOC: Clarify whether these hourly timestamps are Hour Ending or Hour Starting
- [x] DOC: Clarify meanings of `reported`, `adjusted`, `imputed` and `forecast` demand/interchange/generation
- [x] DOC: Verify that `demand_reported_mw` for BA and subregions actually mean the same thing
- [x] DOC: Flesh out real table descriptions
- [x] ENG: Address access/database consequences of only writing hourly EIA-930 data to Parquet
- [x] ENG: copy the EIA-930 parquet outputs out of `parquet` folder for distribution in nightly builds
- [x] Figure out if day ahead demand forecast timestamps are when the forecast was made, or the forecast time
# Out of Scope
- [x] DATA: Remove local timestamps and record the timezone associated with each BA's reporting
- [x] ENG: Define allowed `region_code_eia` values in the BA coding table
- [ ] DATA: [Evaluate discrepancy between reported total & per-fuel generation](https://github.com/catalyst-cooperative/pudl/pull/3584#issuecomment-2067908462) (it is not small!)
- [ ] DATA: [Evaluate consistency of aggregated subregion demand and BA demand](https://github.com/catalyst-cooperative/pudl/pull/3584#issuecomment-2067908462)
- [ ] ENG/DATA: Define additional asset checks to validate the new tables
- [ ] ENG: Harvest BA codes from EIA-860, EIA-861, EIA-923, and EIA-930
- [ ] DATA: Create a coding table for cleaning and documenting BA subregion codes
zaneselvans commented 2 months ago

Hey @grgmiller & @jdechalendar do either of you happen to know whether the EIA-930 timestamps associated with the day-ahead demand forecasts are the time at which the forecast is being made? Or the time associated with the demand that's being predicted? Is there a general convention on this with respect to day-ahead-forecasts? Like is the prediction always being made at time X for the time X+24h?

jdechalendar commented 2 months ago

Hey @grgmiller & @jdechalendar do either of you happen to know whether the EIA-930 timestamps associated with the day-ahead demand forecasts are the time at which the forecast is being made? Or the time associated with the demand that's being predicted? Is there a general convention on this with respect to day-ahead-forecasts? Like is the prediction always being made at time X for the time X+24h?

I've never used these forecasts so I'm useless here - sorry!

grgmiller commented 2 months ago

Hey @grgmiller & @jdechalendar do either of you happen to know whether the EIA-930 timestamps associated with the day-ahead demand forecasts are the time at which the forecast is being made? Or the time associated with the demand that's being predicted? Is there a general convention on this with respect to day-ahead-forecasts? Like is the prediction always being made at time X for the time X+24h?

My understanding is that the timestamp refers to the forecasted interval. If you look at one of the regional excel files, this can be confirmed because there is demand forecast data for timestamps that have not yet occurred.