catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
479 stars 110 forks source link

Integrate Remaining EIA-860 & EIA-923 Tables #1189

Open cmgosnell opened 3 years ago

cmgosnell commented 3 years ago

We have integrated most of the tables from EIA 860 and 923, but we're still missing several. This issue collects all tables that are still missing, so we can keep track of our progress towards complete data integration.

### Remaining EIA 860 Tables
- [ ] #1159
- [ ] #1162
### Remaining EIA 923 Tables
- [ ] https://github.com/catalyst-cooperative/pudl/issues/457
- [ ] #2448
- [ ] https://github.com/catalyst-cooperative/pudl/issues/458
- [ ] https://github.com/catalyst-cooperative/pudl/issues/1302
- [ ] Integrate EIA 923 Stocks Data
- [ ] Integrate EIA 923 plant frame data
- [ ] Integrate EIA 923 Schedule 6_7 Source and Disposition data
### Harvesting tasks
- [ ] https://github.com/catalyst-cooperative/pudl/issues/3365
zaneselvans commented 2 years ago

@grgmiller I feel like I just saw a comment from you somewhere about tackling one or some of this missing data, but I can't find it now. Do you still need/want some guidance on the steps to bringing a new table in?

grgmiller commented 2 years ago

Hi Zane, yes I just posted in the general slack channel about that. That would be helpful if the guidance exists!

Mainly just understanding where I need to add new table names, field names, metadata, etc would be helpful. I could take a trial and error approach where I use the harvesting_debug.ipynb notebook to try and load a new table specified in the settings yaml file until it fails and then fixing the issue, but that seems like a inefficient approach.

zaneselvans commented 2 years ago

Okay @grgmiller just so we have this somewhere relevant and can migrate it to the docs if it turns out to be correct, here's what I think needs to happen to get a new table from the EIA spreadsheets integrated. It kind of trails off in the details at the end but this should be enough to get you started!

Raw Data Source Metadata

Data Extraction

Data Transformation

Database Schema Definition