Open zaneselvans opened 1 year ago
Poked around a bit. One finding: about 15% of these bad_calc values have a balance difference that's exactly equal to the transfer column.
pis['balance_diff'] = pis['ending_balance_calc']-pis['ending_balance']
cols_to_check = ['record_id', 'report_year',
"starting_balance",
"additions",
"retirements",
"adjustments",
"transfers",
'ending_balance',
'ending_balance_calc',
'balance_diff'
]
subset = pis.loc[pis.record_id.isin(bad_calc.record_id), pis.columns.isin(cols_to_check)]
trans = subset.loc[subset.transfers == subset.balance_diff]
trans.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 36 entries, 2812 to 50179
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 report_year 36 non-null Int64
1 starting_balance 36 non-null float64
2 additions 36 non-null float64
3 retirements 33 non-null float64
4 adjustments 2 non-null float64
5 transfers 36 non-null float64
6 ending_balance 36 non-null float64
7 record_id 36 non-null string
8 ending_balance_calc 36 non-null float64
9 balance_diff 36 non-null float64
Huh, that's interesting. Should we adjust the ending balances to account for the transfers?
Looking at the dbf_to_xbrl.csv
file that tracks which years form changes happened in, it seems unlikely that these errors are due to respondents using the wrong version of the form. Thef1_plant_in_service
table was unchanged from 1994 until 2003.
A small number of records in the
plant_in_service_ferc1
table do not sum up to their reportedending_balance
. If we calculate the expected value and compare it to the reported value we find:The number of bad records is small, but some of them are very bad Like off by a factor of 10,000,000x bad. This will break aggregation operations across categories seriously and needs to be addressed somehow.
Possible sources of error
102_sold
or theretirements
column).