catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

Aggregate `data_maturity` appropriately in `generation_fuel_nuclear_eia923` transform #1914

Closed zaneselvans closed 1 year ago

zaneselvans commented 1 year ago

The earliest EIA-923 data (2001-2002) doesn't report nuclear_unit_id in the generation_fuel_eia923 table, and it's a primary key of the resulting generation_fuel_nuclear_eia923 table... so it has to have a value. We do a little backfilling and aggregation to produce usable nuclear generation records in those years, but this process doesn't propagate the new data_maturity label into the created records. This results in an error downstream in the aggregation across nuclear_unit_id in the generic generation_fuel_eia923 output table, which combines the nuclear & non-nuclear generation fuel tables into a single table with date-plant-prime-fuel as the primary key, since some of the records being aggregated have NA values in data_maturity which registers as inhomogeneous (and thus not aggregatable).

Update the generation_fuel_nuclear_eia923 transform process to ensure that the small number of old aggregated records we create all have a data_maturity label derived from their input records.

zaneselvans commented 1 year ago

This has been merged into dev so closing.