catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
477 stars 110 forks source link

find/add inter-table relationships when they include a second dimension #2623

Closed cmgosnell closed 1 year ago

cmgosnell commented 1 year ago

Ex: in the income_statement_ferc1 table, there is an operating_revenues factoid. In the electric_operating_revenues_ferc1 there is a electric_operating_revenues factoid. They are conceptually related but not the same and do not have the same factoid name. The income_statement_ferc1 table & thus the operating_revenues has another dimension of the utility_type. So there is a line of operating_revenues in the income statement table which is electric.

This one is really easy to see bc it is {other_dimension}_{xbrl_factoid_name} but i expect it won't always be that straightforward.

If we add in the implied dimension in the tables where these dimensions are implied (which would probably be helpful anyway) and renamed the factoids to match we could convert these conceptual connections into more tangible ones that can be id-ed programmatically, checked and scrubbed for the 💥

- [x] HUNT DOWN all of these inter-table connections that don't link up by name (@jrea-rmi  if you have any pointers on this that would be vv helpful)
- [x] when required, add the other dimension into the implicitly-dimension-ed table (i.e. `assign(utility_type="electric")` to `electric_operating_revenues_ferc1`.
- [ ] possibly add check into the explosion (e.g. if dimension and factoid name the same, remove duplicate)
e-belfer commented 1 year ago

Will continue to update as I go through the tables

For the electric-specific columns of income_statement_ferc1 table:

e-belfer commented 1 year ago

@jrea-rmi Just circling back to this. Do you all have a working list of fields which are not identically named but are effectively identical and should be compared across tables? See above for some examples. We're particularly looking at 'second dimension' relationships (e.g. electric plants), but if there are others you're aware of this is also helpful to know!

jrea-rmi commented 1 year ago

@e-belfer I don't already have a compiled list like this, but I could review my previous table connections and see if I have additions to what you have.

A few balance sheet fields to add to your income statement notes are:

e-belfer commented 1 year ago

Thanks for the follow-up and the suggestions! I'll work on compiling a list and push it to a PR for you to look over to see that nothing's missing or out-of-place.

cmgosnell commented 1 year ago

closing this issue bc we merged #2669 into explode_ferc1