Closed zaneselvans closed 4 years ago
Okay, within these EIA 923 tables, it looks like mostly there are columns that have constrained values which we should be checking. Are there structural things or other kinds of checks that should be done in this data? Things that wouldn't already be getting checked based on the database structure and lists of allowable values?
Also... can these tests be run on a the aggregated versions of the dataframe? Monthly / Annual? Are these values still included and expected to be valid?
For the types of data validation tests we've got written up, there doesn't really seem to be anything to test in the gen_eia923
table. I could imagine taking some plant or generator information from EIA860, and verifying whether the amount of net generation is plausible given the plant/generator that it's associated with, but that also seems like something that might be better to check in the MCOE / capacity factor validation.
Okay, calling this closed for now. Underlying data issues that were revealed (but not yet addressed) include:
frc_eia923
)frc_eia923
, gf_eia923
, bf_eia923
)gf_eia923
)
For each of the EIA 923 data tables, we need to create a suite of data validity tests -- at least high level sanity checks -- that can be run to ensure nothing weird has happened that's affected the content of the dataset. These can include checking for excessive outlier values, ensuring that median values are within an expected range, etc. See the
test_fbp_ferc1()
function for some examples.Tables that need this kind of sanity check include at least:
Raw (not aggregated by month/year):
frc_eia923
ash_content_pct
(by fuel, min, max, median)chlorine_content_ppm
(by fuel, min, max, median)fuel_cost_per_mmbtu
(by fuel, min, max, median)heat_content_per_unit
(by fuel, min, max, median)mercury_content_ppm
(by fuel, min, max, median)moisture_content_pct
(by fuel, min, max, median)sulfur_content_pct
(by fuel, min, max, median)gf_eia923
fuel_mmbtu_per_unit
(by fuel, min, max, median)bf_eia923
ash_content_pct
(by fuel, min, max, median)fuel_mmbtu_per_unit
(by fuel, min, max, median)sulfur_content_pct
(by fuel, min, max, median)gen_eia923
Aggregated (by month / year):
frc_eia923
ash_content_pct
(by fuel, min, max, median)chlorine_content_ppm
(by fuel, min, max, median)fuel_cost_per_mmbtu
(by fuel, min, max, median)heat_content_per_unit
(by fuel, min, max, median)mercury_content_ppm
(by fuel, min, max, median)moisture_content_pct
(by fuel, min, max, median)sulfur_content_pct
(by fuel, min, max, median)gf_eia923
fuel_mmbtu_per_unit
(by fuel, min, max, median) definitely problems here e.g. for gas a bunch of records end up having heat content that's only about half what it should be. See #389.bf_eia923
ash_content_pct
(by fuel, min, max, median)fuel_mmbtu_per_unit
(by fuel, min, max, median)sulfur_content_pct
(by fuel, min, max, median)gen_eia923