catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

Define string cleaning dictionaries for f1_steam fields #35

Closed zaneselvans closed 7 years ago

zaneselvans commented 7 years ago

The f1_steam table in the ferc1 database has at least two freeform fields that need to be cleaned up. They are type_const and plant_kind. Export a list of all unique strings found in those two fields, from all of the data we can import into the database simultaneously -- years 2004-2015. Using whatever information you can find about what those fields are supposed to describe (e.g. the blank FERC Form 1 document, and the instructions for filling it out) categorize the strings into a few meaningful categories, using the ferc1_fuel_strings and ferc1_fuel_unit_strings dictionary-of-lists in constants.py as a model. Look at whatever other fields you need to within the f1_steam table for context on what is meant by the type_const and plant_kind fields. This issue is complete when there are ferc1_type_const and ferc1_plant_kind dictionaries in constants.py that can be used to clean up these columns.

swinter2011 commented 7 years ago

@zaneselvans take a look - this should be done

swinter2011 commented 7 years ago

@zaneselvans should I mark this issue as closed?