catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

Harvesting process needs to account for composite primary keys #613

Closed zaneselvans closed 2 years ago

zaneselvans commented 4 years ago

I think there might be an issue with the current harvesting process as applied to the 861 tables -- e.g. in the sales_eia861 table, there's a utility_id_eia, and then there are some other columns which in the 860 and 923 are denormalized -- like state -- but in sales_eia861, the state is part of the primary key, as is the balancing_authority_code_eia, so those values can't be stripped out of the table without damaging it. However, those fields how up in plants_eia860 where they do need to be stripped out and associated with the entity that's being constructed.

I've tested this process, and it does in fact remove primary key columns from the sales_eia861 table.

cmgosnell commented 2 years ago

subsumed w/in #806