catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
468 stars 107 forks source link

Unexpected primary key for `denorm_fuel_by_plant_ferc1` #2539

Closed e-belfer closed 1 year ago

e-belfer commented 1 year ago

Describe the bug

As found in #2521, the denorm_fuel_by_plant_ferc1 table aggregates fuel types by plant. However, plant ID is not a primary key of this table. This is counterintuitive to the table description and unexpected.

Bug Severity

How badly is this bug affecting you? Low: The bug isn't causing me problems, but something's still wrong here.

To Reproduce

asset_name = "denorm_fuel_by_plant_ferc1"
new_df = defs.load_asset_value(AssetKey(asset_name))
new_df.set_index(["report_year", "plant_id_pudl"]).index.is_unique # Same for `plant_name_ferc1`

Expected behavior

Some plant ID column should act as the primary key for this table.

Additional context

Up until now we haven't applied a schema to the output tables, so we haven't had to consider this yet.

zaneselvans commented 1 year ago

Sadly, there are no real plant IDs for the FERC Form 1 plants. The primary key will be something like (utility_id_ferc1, plant_name, report_year).

e-belfer commented 1 year ago

In this case, the table behaves exactly as expected, as that is the primary key. Closing this bug.