Closed zschira closed 2 years ago
After some discussion on the status of XBRL integration, we've decided to move away from trying to recreate the historical database. The structure of the data extracted from XBRL filings is an improvement over the historical data, and it probably doesn't make sense to try and go back to the worse structure. For the sake of PUDL, we will develop and maintain two separate ETL's for old/new FERC data. We will, however, provide estimates of the difficulty to create a mapping between new tables and old tables for the specific tables RMI is currently using (#1594).
The process for extracting/transforming FERC data to be compatible with historic FERC data is difficult to impossible to fully automate. In some cases, like the
f1_fuel
table, this is fairly easy. However, in others, likef1_plant_in_srvce
the structure of the data differs significantly, and it's difficult to link XBRL data with historical data. This Visual FoxPro version of this table has the columns:begin_yr_bal
,addition
,retirements
,adjustments
,transfers
,yr_end_bal
While in the equivalent XBRL table, these 6 columns are split out into well over 100 columns, where each of those columns corresponds to a specific
row_number
in the historical data, and thebegin_yr_bal
andyr_end_bal
have to be deduced using dates.FERC's XBRL taxonomy includes
References
(more info here), which provide more info on an XBRL fact's physical location in the form, which is sometimes useful for dealing with problems like the one highlighted above.The persistent problem is deciding when/what information to use to attempt automatically link data, and when manual intervention is required. This issue will track the development of a unified process to be applied to the initial set of tables of interest during the integration process.