catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
468 stars 107 forks source link

Use consistent data column names in balance sheet and income statement tables #2581

Closed zaneselvans closed 1 year ago

zaneselvans commented 1 year ago

Since we need to be able to easily concatenate the FERC 1 balance sheet and income statement tables, and they all have a similar structure with a single dollar value column, they should all really have the same name, rather than their current table-specific names. The tables & columns that could be updated include the following.

@jrea-rmi & @cmgosnell does this list / categorization look correct to you? Looking at it now it seems like we may only need or want to change the diverse column names in the income statement tables, since the balance sheet tables (as far as I understand them) already have a uniform system of starting/ending balance that we could use to calculate an annual change if we wanted to -- would that be a useful value to keep in the denormalized versions of these tables? Not sure what's going on with retained_earnings_ferc1 which seems to have both starting/ending balances and an amount column.

Column rename process

### Balance Sheet
- [x] `balance_sheet_assets_ferc1` (`f1_comp_balance_db`): Only has `starting_balance` and `ending_balance`
- [x] `balance_sheet_liabilities_ferc1` (`f1_bal_sheet_cr`): Only has `starting_balance` and `ending_balance`
- [x] `plant_in_service_ferc1` (`f1_plant_in_srvce`): Has `starting_balance` and `ending_balance` plus categorized annual changes.
- [x] `electric_plant_depreciation_functional_ferc1` (`f1_accumdepr_prvsn`): `utility_plant_value` (equivalent to `ending_balance`, same as `electric_plant_depreciation_changes_ferc1` but with the changes broken down by functional type rather than account)
- [x] ~`electric_plant_depreciation_changes_ferc1` (`f1_accumdepr_prvsn`): `utility_plant_value` (has both `starting_balance` and `ending_balance` plus per-account changes during the year. However, they are categories in the `depreciation_type` column, rather than columns themselves, because there's another `plant_status` dimension column.~ Skipping for now because it's complicated, the names are already "correct" and @jrea-rmi says he isn't using this table as of yet.
- [x] `utility_plant_summary_ferc1` (`f1_utltyplnt_smmry`): `utility_plant_value`. Looking at the form it seems like this is equivalent to `ending_balance`.
- [ ] `retained_earnings_ferc1` (`f1_retained_erng`): Has `starting_balance` and `ending_balance` and also `amount`. Unclear looking at the form what `amount` corresponds to. Mostly NA start/end balances. Looking at form it seems like it might be unstructured data? Something is wrong here. The primary affected FERC account is getting lost, the freeform `item` description is also being lost.  I think we need to revisit this transform.
### Income Statement
- [x] `income_statement_ferc1` (`f1_sales_by_sched`, `f1_sales_by_sched`): `income`
- [x] `electric_operating_expenses_ferc1` (`f1_elc_op_mnt_expn`): `expense`
- [x] `electric_operating_revenues_ferc1` (`f1_elctrc_oper_rev`): `revenue`
- [x] `depreciation_amortization_summary_ferc1` (`f1_dacs_epda`): `depreciation_amortization_value`
- [x] `electricity_sales_by_rate_schedule_ferc1` (`f1_sales_by_sched`): `sales_revenue`
jrea-rmi commented 1 year ago

retained_earnings_ferc1 has both balances (starting_balance and ending_balance, FERC accounts in the 200s) and flows that explain changes during the year to go from the starting balance to the ending balance (amount, FERC accounts in the 400s). However, in 2020 and earlier data, they seem to have put some ending_balance values in the amount column?

For current Hub analysis, we're just using ending_balance to explode one line of the liabilities side of the balance sheet - except for that issue that some values might be appearing in the wrong column.

The above issue for retained_earnings_ferc1 makes me like the idea of keeping starting_balance and ending_balance as the named fields for balance sheet tables. Then for income statement tables, expense/revenue/income are all flows of dollars and could be conformed to the same name of dollar_amount.

zaneselvans commented 1 year ago

@jrea-rmi looking at the balance sheet and liabilities tables in the RMI XBRL Table Integration sheet I see a few tables that don't show up in your presentation about the exploded tables, namely:

Do these liabilities tables not feed into the exploded table calculations at all?

There are also two different tables that we've derived from the two portions of the f1_accumdepr_prvsn table -- one that tracks changes in the balance by account, and the other that tracks the changes by plant function (steam, hydro, etc.) Do you want to be able to track both of those kinds of changes with aggregations in the exploded tables?

jrea-rmi commented 1 year ago

The presentation isn't comprehensive, sorry about that.

I stack the liabilities side of the balance sheet on top of the assets, then label lines as a component of rate base or not in the same way. Some of the liabilities offset parts of assets.

In f1_accumdepr_prvsn, the only part I've used is section B, balances by functional classification (steam, nuclear, etc.)

e-belfer commented 1 year ago

Closed, retained_earnings_ferc1 issues will be resolved separately.