catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

Data sanity checks for FERC Form 1 #277

Closed zaneselvans closed 4 years ago

zaneselvans commented 5 years ago

For each of the EIA 860 data tables, we need to create a suite of data validity tests -- at least high level sanity checks -- that can be run to ensure nothing weird has happened that's affected the content of the dataset. These can include checking for excessive outlier values, ensuring that median values are within an expected range, etc. See the test_fbp_ferc1() function for some examples.

Output tables that should be sanity checked include:

zaneselvans commented 4 years ago

The ferc1 plants beyond steam and fuel (and their bastard child, fuel_by_plant aka fbp) haven't been sufficiently worked over by PUDL to justify any kind of validation. They are... buyer-beware.

There are definitely more kinds of validation that can and should be done here but, at least these two tables have some checks to make sure we don't do something ridiculous.