catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

Add 2019 data to EIA861 Short Form table #3654

Open aesharpe opened 1 month ago

aesharpe commented 1 month ago

Overview

The EIA861 Form has a regular version and a short form version. In all years except 2019, the responses to the short form are reported in the Short_Form_####.xlsx (where #### is the year) in the raw data. In 2019, the short form data is dispersed throughout the other tables in a short_form column instead of being stored in a Short_Form specific table.

Issue #2768 and PR #3565 integrated the short form table into the PUDL database but left the 2019 short form data in the other EIA861 tables.

This issue is dedicated to bringing the 2019 EIA861 short form data into the core_eia861__yearly_short_form table.

Success Criteria

### Next steps
* [ ] ...
zaneselvans commented 1 month ago

Given that addressing this issue will require modifying pretty much all of the EIA-861 transforms, and then somehow integrating the little 2019 short form dataframes together with the bulk of the short form data, I'm hesitant to mark it as a good first issue. It seems like it could be pretty involved.

aesharpe commented 1 month ago

I don't think we have well-defined criteria for a GFI right now. I added the label because it has a very clear scope / success criteria and only touches a small part of the code. I can see why you'd be hesitant for new folks to work on it, but I can also envision someone with enough skills to do it. Don't have strong feelings here, just want to get in the habit of using the GFI label when appropriate.