catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
469 stars 108 forks source link

Refactor `plants_steam_ferc1` transform for XBRL + DBF inputs #1707

Closed cmgosnell closed 2 years ago

cmgosnell commented 2 years ago

Below are my (WIP!) notes about how to rearrange the transform step for the steam table:

pre-concat but mirrored (as in same steps diff inputs)

DBF

XBRL

Post concat

cmgosnell commented 2 years ago

Column mapping notes:

index entity_id filing_name start_date end_date PlantNameAxis OrderNumber PlantName PlantKind PlantConstructionType YearPlantOriginallyConstructed YearLastUnitOfPlantInstalled NetPeakDemandOnPlant PlantHoursConnectedToLoad NetContinuousPlantCapability NetContinuousPlantCapabilityNotLimitedByCondenserWater NetContinuousPlantCapabilityLimitedByCondenserWater PlantAverageNumberOfEmployees NetGenerationExcludingPlantUse CostPerKilowattOfInstalledCapacity OperationSupervisionAndEngineeringExpense FuelSteamPowerGeneration CoolantsAndWater SteamExpensesSteamPowerGeneration SteamFromOtherSources SteamTransferredCredit ElectricExpensesSteamPowerGeneration MiscellaneousSteamPowerExpenses RentsSteamPowerGeneration Allowances MaintenanceSupervisionAndEngineeringSteamPowerGeneration MaintenanceOfStructuresSteamPowerGeneration MaintenanceOfBoilerPlantSteamPowerGeneration MaintenanceOfElectricPlantSteamPowerGeneration MaintenanceOfMiscellaneousSteamPlant PowerProductionExpensesSteamPower ExpensesPerNetKilowattHour
12 C001130 533aafa3-c8d2-4c5d-9a31-b3fe1ed29f21 2021-01-01 2021-12-31 $— 8.0 None None NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

- `asset_retirement_cost`: does this exist in the table anymore?!?!
cmgosnell commented 2 years ago

@zaneselvans do you think it's a good idea to move all of the mapping that is currently being done via pudl.helpers.cleanstrings into the metadata encoder?

zaneselvans commented 2 years ago

We talked about this on the phone but, I think that given there are literally thousands of bad strings which we're cleaning up in (especially) the FERC 1 tables, it's qualitatively a different problem, even if structurally it's similar or the same. Rather than transforming several iterations of historical codes, plus a few weird values, to canonical codes, it's more like a wholesale categorization of something that's not even attempting to be a coded column to begin with.

zaneselvans commented 2 years ago

I think given that we've merged in the steam table specific stuff into xbrl_steam, and have #1705 as a separate issue for dealing with the utility ID assignments, this can probably be closed now.