catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
463 stars 106 forks source link

Fill in fuel type and units on ingest of FERC1 fuel data #101

Open zaneselvans opened 7 years ago

zaneselvans commented 7 years ago

There are some cases in which we get good clean category information about a plant and its fuel for some years, but not for others. Right now this means that we're perhaps unnecessarily dumping data that we could keep -- if we were willing to assign the categories for the gapping years.

For instance, in 2014-2015, (respondent_id=2, plant_name=Barry) has nothing in the fuel or fuel_unit fields in the FERC Form 1 database clone, but for 2007-2012 we see that it's fuel=coal and fuel_unit=tons, and there are good numerical values in the fuel_quantity, fuel_avg_heat, fuel_cost_btu, and other fields.

Once we have generated time series for each FERC plant and inserted the new Plant ID values into the fuel_ferc1 table, we can go ahead and fill in many of these missing categorical values. This will require addressing issue #39 as well -- since we'll need to retain many imperfect records that are currently getting dropped in order to fix them.

zaneselvans commented 5 years ago

It seems like this is a very similar problem to the one that @cmgosnell has been working on with backfilling constant values (like location) in the EIA 860 data. Maybe the same tool can be applied?

cmgosnell commented 5 years ago

:-/ sort of. what I've been doing is not backfilling. I've been attempting to find the canonical annual or static values from multiple tables and store that info in one table as opposed to all over the place. But for the static info, this can be used to backfill. Nonetheless, the process is pretty generalizable and could certainly be applied to FERC.