The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
Preparations for our 7-15min long recurring demonstration of PUDL at OpenMod US 2023. We want to be able to show folks how to easily access and work with the data we publish using Jupyter notebooks, Datasette, nightly build outputs, etc.
Motivation
Get people aware of and excited about working with the open data we publish.
Give people enough of an intro that they feel able to play with the data on their own after the conference.
Target audience is folks that already have some domain knowledge (OpenMod attendees) but may have a variety of different technical backgrounds / familiarity with different sets of tools.
Scope
The PUDL Dataset on Kaggle is well documented (Usability of 9+ out of 10?)
The PUDL Dataset on Kaggle is being automatically updated based on nightly builds.
The notebooks associated with the PUDL Dataset on Kaggle are being automatically tested as the data evolves.
Our Datasette deployment is working and can handle a bit of a spike in new usage.
We are able to capture and analyze PUDL usage that results from this outreach.
We have a 7-15 minute demonstration that we can run through with a new user which covers:
Interactive access & computation via Jupyter notebooks on Kaggle
Browsing and querying of data on Datasette
Bulk data download from the AWS Open Data Registry for local usage
Bulk data download from a versioned Zenodo archive.
Data Dictionaries that annotate the data on Read the Docs.
Out of Scope
Introducing users to PUDL development environment setup.
Introducing users to running the back-end / Dagster.
Comanche Notebook Outline:
Given narrative context around the plant, how do we find it in the data?
Create a table with some basic summary information about CO coal-fired generators.
Make a map of CO coal plants in 2010 vs 2022
Group generators by plant and primary fuel type, sum capacity
Now we know EIA plant ID is 470, generators are 1, 2, 3. Dig in there.
Using monthly EIA-923 data show:
total net generation in MWh
total fuel consumption in MMBTU
heat rate (thermal efficiency) in MMBTU / MWh
fuel costs in $/MWh
capacity factor
Using annual FERC Form 1 data show:
annually averaged non-fuel operating costs in $/MWh
annually averaged CapEx in $/MW of capacity
Note that fuel consumption, fuel cost, and net generation is also available in FERC 1, but is not as granular or reliable as EIA-923.
Highlight existence of multiple ownership slices and complicated reporting if it shows up.
Using EPA CEMS:
Compare CEMS derived monthly net generation, fuel consumption, capacity factors, and implied heat rates with those we got from the EIA-923.
Using hourly data, look at the structure of outages / operational loads.
Highlight frequent outages for unit 3. Low capacity factor isn't because of ramping. It's either on or off.
Calculate emissions.
# Minimum Requirements
- [x] Transfer ownership of PUDL dataset to Catalyst Cooperative on Kaggle (or create a new dataset if we can't transfer)
- [x] Schedule the Catalyst-owned PUDL dataset to update weekly.
- [ ] Merge the rename PR so that users see what the DB is going to look like going forward.
- [ ] Update example notebooks to work in the Kaggle python environment.
- [ ] Update example notebooks to work with the data-only outputs.
- [ ] Manually fill in dataset and file-level metadata.
- [ ] Develop a ~10 minute demonstration script.
- [ ] Ensure that table & column level previews for `pudl.sqlite` are working
# Example Notebooks
- [x] Get [PUDL Example notebooks](https://github.com/catalyst-cooperative/pudl-examples) linked to the PUDL dataset.
- [x] Schedule example notebooks to run automatically when PUDL dataset is updated to verify that they still work.
- [x] Load data from SQLite
- [x] Load CEMS data from Parquet efficiently using dask
- [ ] Plot some energy system operational data
- [ ] Plot some utility financial data (maybe FERC 1 large plant expenses over time?)
- [ ] Make a service territory map
- [ ] Plot state-level electricity demand estimates.
- [ ] Demonstrate the link between FERC and EIA data.
# Stretch Goals
- [ ] Do a versioned data release on AWS & Zenodo
- [ ] Update `ferc-xbrl-extractor` to Frictionless v5 so we can correctly annotate the XBRL derived SQLite DBs.
- [ ] Update `pudl` to Frictionless v5 so we can correctly annotate the PUDL SQLite DB
- [ ] Create a valid `dataset-metadata.json` annotating all nightly build outputs for easier use on Kaggle.
- [ ] Create a Kaggle Organization to manage our datasets and competitions going forward
Description
Preparations for our 7-15min long recurring demonstration of PUDL at OpenMod US 2023. We want to be able to show folks how to easily access and work with the data we publish using Jupyter notebooks, Datasette, nightly build outputs, etc.
Motivation
Scope
Out of Scope
Comanche Notebook Outline: