Closed cmgosnell closed 2 years ago
Column mapping notes:
plant_name_ferc1
has two options which are almost entirely the same:
steam_extracted_xbrl = pudl.helpers.simplify_strings(steam_extracted_xbrl, ["PlantName", "PlantNameAxis"])
steam_extracted_xbrl[steam_extracted_xbrl.PlantName != steam_extracted_xbrl.PlantNameAxis]
index | entity_id | filing_name | start_date | end_date | PlantNameAxis | OrderNumber | PlantName | PlantKind | PlantConstructionType | YearPlantOriginallyConstructed | YearLastUnitOfPlantInstalled | NetPeakDemandOnPlant | PlantHoursConnectedToLoad | NetContinuousPlantCapability | NetContinuousPlantCapabilityNotLimitedByCondenserWater | NetContinuousPlantCapabilityLimitedByCondenserWater | PlantAverageNumberOfEmployees | NetGenerationExcludingPlantUse | CostPerKilowattOfInstalledCapacity | OperationSupervisionAndEngineeringExpense | FuelSteamPowerGeneration | CoolantsAndWater | SteamExpensesSteamPowerGeneration | SteamFromOtherSources | SteamTransferredCredit | ElectricExpensesSteamPowerGeneration | MiscellaneousSteamPowerExpenses | RentsSteamPowerGeneration | Allowances | MaintenanceSupervisionAndEngineeringSteamPowerGeneration | MaintenanceOfStructuresSteamPowerGeneration | MaintenanceOfBoilerPlantSteamPowerGeneration | MaintenanceOfElectricPlantSteamPowerGeneration | MaintenanceOfMiscellaneousSteamPlant | PowerProductionExpensesSteamPower | ExpensesPerNetKilowattHour |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
12 | C001130 | 533aafa3-c8d2-4c5d-9a31-b3fe1ed29f21 | 2021-01-01 | 2021-12-31 | $— | 8.0 | — | None | None | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
- `asset_retirement_cost`: does this exist in the table anymore?!?!
@zaneselvans do you think it's a good idea to move all of the mapping that is currently being done via pudl.helpers.cleanstrings
into the metadata encoder?
We talked about this on the phone but, I think that given there are literally thousands of bad strings which we're cleaning up in (especially) the FERC 1 tables, it's qualitatively a different problem, even if structurally it's similar or the same. Rather than transforming several iterations of historical codes, plus a few weird values, to canonical codes, it's more like a wholesale categorization of something that's not even attempting to be a coded column to begin with.
I think given that we've merged in the steam table specific stuff into xbrl_steam
, and have #1705 as a separate issue for dealing with the utility ID assignments, this can probably be closed now.
Below are my (WIP!) notes about how to rearrange the transform step for the steam table:
pre-concat but mirrored (as in same steps diff inputs)
DBF
_clean_cols
?: most of this function is pretty dang dbf specific, but we'll need analogous xbrl cleaning as wellassign_record_id_xbrl
->assign_record_id
XBRL
_instant
and_duration
tablesPost concat
cleanstrings
oob_to_nan
_plants_steam_assign_plant_ids
(we decided that this needs to be post concat right?