catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
481 stars 110 forks source link

Prepare OpenMod PUDL Demo #2922

Closed zaneselvans closed 1 year ago

zaneselvans commented 1 year ago

Description

Preparations for our 7-15min long recurring demonstration of PUDL at OpenMod US 2023. We want to be able to show folks how to easily access and work with the data we publish using Jupyter notebooks, Datasette, nightly build outputs, etc.

Motivation

Scope

Out of Scope

Comanche Notebook Outline:

# Minimum Requirements
- [x] Transfer ownership of PUDL dataset to Catalyst Cooperative on Kaggle (or create a new dataset if we can't transfer)
- [x] Schedule the Catalyst-owned PUDL dataset to update weekly.
- [ ] Merge the rename PR so that users see what the DB is going to look like going forward.
- [ ] Update example notebooks to work in the Kaggle python environment.
- [ ] Update example notebooks to work with the data-only outputs.
- [ ] Manually fill in dataset and file-level metadata.
- [ ] Develop a ~10 minute demonstration script.
- [ ] Ensure that table & column level previews for `pudl.sqlite` are working
# Example  Notebooks
- [x] Get [PUDL Example notebooks](https://github.com/catalyst-cooperative/pudl-examples) linked to the PUDL dataset.
- [x] Schedule example notebooks to run automatically when PUDL dataset is updated to verify that they still work.
- [x] Load data from SQLite
- [x] Load CEMS data from Parquet efficiently using dask
- [ ] Plot some energy system operational data
- [ ] Plot some utility financial data (maybe FERC 1 large plant expenses over time?)
- [ ] Make a service territory map
- [ ] Plot state-level electricity demand estimates.
- [ ] Demonstrate the link between FERC and EIA data.
# Stretch Goals
- [ ] Do a versioned data release on AWS & Zenodo
- [ ] Update `ferc-xbrl-extractor` to Frictionless v5 so we can correctly annotate the XBRL derived SQLite DBs.
- [ ] Update `pudl` to Frictionless v5 so we can correctly annotate the PUDL SQLite DB
- [ ] Create a valid `dataset-metadata.json` annotating all nightly build outputs for easier use on Kaggle.
- [ ] Create a Kaggle Organization to manage our datasets and competitions going forward
jdangerx commented 1 year ago

Minimal to-do list to get something we can demo, if @zaneselvans is indisposed: