Closed aesharpe closed 1 year ago
I have both xbrl and dbf transform steps into the tidy format we're using in the Hub here: https://github.com/rmi-electricity/utility-transition-hub/blob/f8fc3c49caf2b2d14238468b91770ecf94183704/hub/energy_sources_disposition.py
You can look at the final output by running
from hub.energy_sources_disposition import EnergySourcesDisp
energy_sources_f1 = EnergySourcesDisp()._create_energy_sources_f1()
disposition_f1 = EnergySourcesDisp()._create_disposition_f1()
The part my transformation/combination is missing is conversion from old respondent_id to new entity_id. I didn't see that table in ferc1_xbrl.sqlite, does it exist elsewhere?
Hey @jrea-rmi maybe this has already become clear, but the mapping between the old respondent_id
and the new entity_id
can't happen in the context of extracting the XBRL into an SQLite DB, because the old IDs aren't in there. It's been integrated into the PUDL ID mappings in the PUDL DB. I imagine @cmgosnell has already chatted with you about this since she did most of that work.
@zaneselvans yes, I've chatted with @cmgosnell about it, and am expecting the final transformed/combined PUDL version of FERC tables to use utility_id_ferc1 as an identifier with another mapping table(s) that connect utility_id_ferc1 to respondent_id and entity_id.
Those mappings exist in the PUDL DB where? Not in datasette yet, right?
Table Notes
duration
-- nothing in theinstant
table.duration
table. This year and last year. What to do?megawatt_hours_purchased_other_than_storage
andmegawatt_hours_purchased_for_energy_storage
. A new categoryenergy_stored
was also added. How should we handle these cases where there's not a direct correspondence between the two data sources? I'm inclined to just let them not align, and manage that kind of thing in the transformations or aggregations later.