catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
468 stars 107 forks source link

Refactor `fuel_ferc1` transform for XBRL + DBF inputs #1722

Closed cmgosnell closed 2 years ago

cmgosnell commented 2 years ago

Adapt the fuel_ferc1 transformation process to use the new abstractions developed in #1739, and to accommodate raw inputs from both the old DBF and new XBRL data.

DBF specific transforms

XBRL specific transforms

Table-specific post-concatentation transforms

Generic FERC Form 1 final transformations

Must Fix

Other Loose Ends

Issues resulting from or related to this issue

FERC 1 specific questions:

Note: the fuel_ferc1 transform has to be done before plants_steam_ferc1 because the algorithmic assignment of plant_id_ferc1 values depends on fuel information, so this issue is blocking #1707

cmgosnell commented 2 years ago

There are 126 records which have two records per plant that have the same fuel_type_code_pudl. the original FuelTypeAxis is unique. but the cleaned codes are not.

because of this, it is breaking the new convention that we can use the Axis xbrl columns as primary keys in the creation of the record_id.

We could use some hash or something other than a composite key, but in attempt to preserve the primary key, the idea right now is to condense/aggregate these duplicate records.

condense them

there are a slew of records like this: image

where one is effectively empty except for one data point.

for these, i built a little function called condense_sets_of_records_with_one_datapoint_in_second_record (omigosh please halp me rename this)

aggregate them

i'll probably use pudl.helpers.sum_and_weighted_average_agg()

delete/null them?

these nuclear records seem.... unsavable. idk what to do with them image

zaneselvans commented 2 years ago

I'm calling this issue ready for review, and have created several separate smaller issues pertaining to testing & transform parameter validation, which I'll move on to now @cmgosnell:

zaneselvans commented 2 years ago

With #1903 and #1900 getting merged into xbrl_steam this issue is done.