The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
Before we can begin integrating XBRL data into existing pipelines (like PUDL and RMI's pipeline) we must develop tools for working with XBRL data. This epic tracks the ongoing work to develop a tool for extracting data from XBRL filings, as well other important infrastructure, like archiving filings distributed with FERC's new RSS feed.
Known irregularities
Many of the tables in the XBRL data do not have any equivalent to row_num and spplmnt_num from the historical data. These fields are used to uniquely identify records, so this seems like a problem, but I believe an equivalent to these fields is (hopefully) always included when records can not be uniquely identified by other fields. For example, the table 410 - Schedule - Generating Plant Statistics (which is equivalent to f1_gnrt_plant) has a column GeneratingPlantStatisticsAxis, which contains values in the form {spplmnt_num}-{row_num}.
The easiest way to automate the transformation from extracted XBRL data to be compatible with historical data is to use the order of columns in tables. This is because the column names are different enough to make it difficult to match columns by name. This is explored in this notebook. Unfortunately, the columns are not always in the same order, however. For example, in the table f1_steam the column asset_retire_cost is near the end of the table among footnote columns, while it is towards the middle in the equivalent XBRL table. The rest of the columns are in the same order, but this one column needs to be accounted for in some way.
While most tables contain essentially the same structure with different column names, there are some columns with different structures. For example, f1_plant_in_srvce contains the columns begin_yr_bal and yr_end_bal. In the XBRL data, however, these values are reported in the same column, but with different dates to identify them. This is not a particularly difficult situation to deal with, but irregularities like this may prove to be difficult to identify in an automated way.
List of tables used by RMI
Respondent ids and names
f1_respondent_id
Balance sheet (assets) breakdown
f1_comp_balance_db
f1_utltyplnt_smmry
f1_plant_in_srvce
f1_accumdepr_prvsn
Balance sheet (liabilities) breakdown
f1_bal_sheet_cr
f1_retained_erng
Income statement breakdown
f1_income_stmnt
f1_incm_stmnt_2
f1_elctrc_oper_rev
f1_dacs_epda
f1_elc_op_mnt_expn
Plant tables, for both balance sheet and income statement breakdown, to be overwritten by ferc-eia-deprish in the future
Background
Before we can begin integrating XBRL data into existing pipelines (like PUDL and RMI's pipeline) we must develop tools for working with XBRL data. This epic tracks the ongoing work to develop a tool for extracting data from XBRL filings, as well other important infrastructure, like archiving filings distributed with FERC's new RSS feed.
Known irregularities
Many of the tables in the XBRL data do not have any equivalent to
row_num
andspplmnt_num
from the historical data. These fields are used to uniquely identify records, so this seems like a problem, but I believe an equivalent to these fields is (hopefully) always included when records can not be uniquely identified by other fields. For example, the table410 - Schedule - Generating Plant Statistics
(which is equivalent tof1_gnrt_plant
) has a columnGeneratingPlantStatisticsAxis
, which contains values in the form{spplmnt_num}-{row_num}
.The easiest way to automate the transformation from extracted XBRL data to be compatible with historical data is to use the order of columns in tables. This is because the column names are different enough to make it difficult to match columns by name. This is explored in this notebook. Unfortunately, the columns are not always in the same order, however. For example, in the table
f1_steam
the columnasset_retire_cost
is near the end of the table among footnote columns, while it is towards the middle in the equivalent XBRL table. The rest of the columns are in the same order, but this one column needs to be accounted for in some way.While most tables contain essentially the same structure with different column names, there are some columns with different structures. For example,
f1_plant_in_srvce
contains the columnsbegin_yr_bal
andyr_end_bal
. In the XBRL data, however, these values are reported in the same column, but with different dates to identify them. This is not a particularly difficult situation to deal with, but irregularities like this may prove to be difficult to identify in an automated way.List of tables used by RMI
Respondent ids and names
Balance sheet (assets) breakdown
Balance sheet (liabilities) breakdown
Income statement breakdown
Plant tables, for both balance sheet and income statement breakdown, to be overwritten by ferc-eia-deprish in the future
For purchased power breakout:
Additional table handed off to optimus:
Additional table handed to transmission team:
f1_xmssn_line
Have looked at, might use in the near future:
f1_sales_by_sched
f1_othr_reg_assets
f1_othr_reg_liab
Tasks (to be turned into issues)
[x] #1579
[x] #1594
[x] #1595
[x] #1593
[x] #1629
[x] #1630