catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
469 stars 108 forks source link

Transform `f1_elctrc_erg_acct` xbrl + dbf #1819

Closed aesharpe closed 1 year ago

aesharpe commented 2 years ago

Table Notes

elec_energy_acct_dbf = pd.read_sql("f1_elctrc_erg_acct", ferc1_engine)
elec_energy_acct_xbrl = pd.read_sql("electric_energy_account_401a_duration", ferc1_xbrl_engine)
jrea-rmi commented 2 years ago

I have both xbrl and dbf transform steps into the tidy format we're using in the Hub here: https://github.com/rmi-electricity/utility-transition-hub/blob/f8fc3c49caf2b2d14238468b91770ecf94183704/hub/energy_sources_disposition.py

You can look at the final output by running

from hub.energy_sources_disposition import EnergySourcesDisp
energy_sources_f1 = EnergySourcesDisp()._create_energy_sources_f1()
disposition_f1 = EnergySourcesDisp()._create_disposition_f1()

The part my transformation/combination is missing is conversion from old respondent_id to new entity_id. I didn't see that table in ferc1_xbrl.sqlite, does it exist elsewhere?

zaneselvans commented 1 year ago

Hey @jrea-rmi maybe this has already become clear, but the mapping between the old respondent_id and the new entity_id can't happen in the context of extracting the XBRL into an SQLite DB, because the old IDs aren't in there. It's been integrated into the PUDL ID mappings in the PUDL DB. I imagine @cmgosnell has already chatted with you about this since she did most of that work.

jrea-rmi commented 1 year ago

@zaneselvans yes, I've chatted with @cmgosnell about it, and am expecting the final transformed/combined PUDL version of FERC tables to use utility_id_ferc1 as an identifier with another mapping table(s) that connect utility_id_ferc1 to respondent_id and entity_id.

Those mappings exist in the PUDL DB where? Not in datasette yet, right?