catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
469 stars 108 forks source link

Derive unique source `record_id` from FERC 1 XBRL data #1706

Closed cmgosnell closed 2 years ago

cmgosnell commented 2 years ago

pudl.transform.ferc1._clean_cols makes a new column called record_id in this format:

{table_name}_{report_year}_{report_prd}_{respondent_id}_{spplmnt_num}_{row_number}

The new xbrl data does not have: respondent_id, spplmnt_num, or row_number. We'll have to do something about respondent_id (see #1705), but supplement and row number are outta there.

This id column is helpful in that it creates an unchanging and unique id for each record.

zschira commented 2 years ago

Issue #1705 creates a map between respondent_id's and the new entity_id or (Company Identifier). However, not every utility will have both an entity_id and respondent_id, so we will need to develop a strategy to work around that.

zaneselvans commented 2 years ago

This function has been implemented as part of #1722