Closed cmgosnell closed 2 years ago
I think we should probably go ahead and map all the columns even if they seem not particularly useful, since we don't really understand all the possible use cases, and we're trying to provide programmatic access to the underlying dataset... whatever it was. It might be a little tedious but it shouldn't be too difficult, should it?
@stevenbwinter @swinter2011 here's an issue related to the unmapped columns...
this feels very very separable from getting the er data integrated - which is a high priority - so I'd propose we wait. doing this in tandem would entangle two tasks and make the higher priority task slower to accomplish. we can and should put this on the docket right after the early release stuff gets integrated.
also here are the new list of columns from the latest etl readout:
2022-08-04 12:49:16 [ WARNING] pudl.extract.excel:260 Extra columns found in page boiler_generator_assn: {'generator_association', 'plant_name', 'steam_plant_type', 'utility_name'}
2022-08-04 12:49:36 [ WARNING] pudl.extract.excel:260 Extra columns found in page generator: {'fercdock', 'winter_capacity', 'summer_capacity', 'fercother', 'fercewgdoc', 'planned_derates_net_summer_cap', 'ferccogen'}
2022-08-04 12:51:04 [ WARNING] pudl.extract.excel:260 Extra columns found in page generator_existing: {'planned_energy_source_1'}
2022-08-04 12:51:10 [ WARNING] pudl.extract.excel:260 Extra columns found in page generator_proposed: {'winter_estimated_capacity', 'winter_capacity', 'summer_capacity', 'summer_estimated_capacity'}
2022-08-04 12:51:56 [ WARNING] pudl.extract.excel:260 Extra columns found in page plant: {'ferc_exempt_wholesale_generator_docket_number', 'ownertransdist'}
2022-08-04 12:52:04 [ WARNING] pudl.extract.excel:260 Extra columns found in page utility: {'areacode'}
While running an etl w/ all years (2008 - 2018) I'm getting these warnings about the columns from the extracted data not matching up with the column maps:
Note: this is not necessarily a failure. I incorporated these warnings into the extract step mostly for 861 because we are mapping with locations instead of strings.
I did a quick investigation of the
boiler_generator_assn
table and the non-mapped columns were aSteam Plant Type
which is a code from the most recent years which.. doesn't seem super useful on the face of it. And an old column which is basically an observed or "theoretical" boolean.Steam plant type description from 860: 1 = Plants with combustible-fueled steam-electric generators with a sum of 100 MW or more steam-electric nameplate capacity (including combined cycle steam-electric generators with duct firing). 2 = Plants with combustible-fueled steam-electric generators with a sum of 10 MW or more but less than 100 MW steam-electric nameplate capacity (including combined cycle steam-electric generators with duct firing). 3 = Plants with nuclear fueled generators, combined cycle steam-electric generators without duct firing and solar thermal electric generators using a steam cycle with a sum of 100 MW or more steam-electric nameplate capacity. 4 = Plants with non-steam fueled electric generators (wind, PV, geothermal, fuel cell, combustion turbines, IC engines, etc.) and electric generators not meeting conditions of categories above.