Open zaneselvans opened 4 years ago
@cmgosnell and I are going to help get @knordback working on this issue as a way to become more familiar with the harvesting process, working with our code, Jupyter, etc.
@cmgosnell while talking over some of these fields with @knordback yesterday, I noticed that the associated_combined_heat_power
field is part of the generators_entity_eia
table, but there's another combined_heat_power
field being reported in e.g. the generation_fuel_eia923
table, and looking at the spreadsheets, it seems like that field pertains to the plant (which makes some sense given that generation_fuel_eia923
is reported on a date, plant, prime-mover, fuel basis).
Are these different attributes? Should there be a CHP field at both the generator and the plant level? Should this really be a permanent attribute, or is it another one that changes slowly? Does the generator field really just indicate that the generator is part of a plant that does CHP? Or that it's part of a generation unit that does CHP? Could the plant or plant-prime-fuel level CHP status be inferred from the generator-level CHP attributes?
Right now we're discarding the CHP column reported in generation_fuel_eia923
.
@grgmiller or @gschivley do either of you have more context on the relationship between these two different CHP fields?
I don't know exactly. associated_combined_heat_power
originates in the generator table. I would not be surprised if there were plants that had some units contributing to a CHP and some that just generated power. I don't think it's generally a good idea to base any logic about the workings of a plant based off of the reporting structure of the generation_fuel_eia923
table. I personally would check whether this value is actually consistent across all generators within a plant before thinking about moving it. But also i could definitely imagine this changing over time (albeit very rarely!).
It seems like we should probably do an exhaustive check of all the currently "permanent" generator attributes on the pre-harvested dataframes... and see how permanent they actually are.
I do not have any context on these two fields.
I'll hold off on this one for now.
I think this is mostly done. Based on notes above I left in code dropping some of the fields in clean_generation_fuel_eia923() and clean_fuel_receipts_costs_eia923(), but I'm not certain I'm interpreting the notes correctly. There's also implicit dropping in plants_eia923(), and I don't know if that's as desired or not.
In many of our older EIA transformation functions, we preemptively drop columns from the tables that are being processed, in order to produce normalized tables. However, many of these columns contain information about the entities (plants, generators, utilities) that should be integrated into the entity harvesting and resolution process, which happens after the transform step.
Discarded Columns
pudl.metadata.fields
column_map.csv
undersrc/pudl/package_data/{data source}/
so that it matches the DB schema.total_fuel_consumption_mmbtu
is an annual total of monthly values that are retained, and so we don't need it.EIA-860
pudl.transform.eia860.ownership()
pudl.transform.eia860.generators()
pudl.transform.eia860.plants()
pudl.transform.eia860.utilities()
EIA-923
pudl.transform.eia923.plants()
pudl.transform.eia923.generation_fuel()
combined_heat_power
plant_name_eia
operator_name
(probablyutility_name_eia
)operator_id
(probablyutility_id_eia
)plant_state
census_region
nerc_region
naics_code
fuel_unit
(should probably be dropped, since unit is implied by fuel type)total_fuel_consumption_quantity
(annual total?)electric_fuel_consumption_quantity
(annual total?)total_fuel_consumption_mmbtu
(annual total?)elec_fuel_consumption_mmbtu
(annual total?)net_generation_megawatthours
(annual total?)early_release
pudl.transform.eia923.boiler_fuel()
This one may give you trouble. See #1847 and #1836.
combined_heat_power
plant_name_eia
operator_name
(probablyutility_name_eia
)operator_id
(probablyutility_id_eia
)plant_state
census_region
nerc_region
naics_code
fuel_unit
(should probably be dropped, since unit is implied by fuel type)total_fuel_consumption_quantity
(annual total?)balancing_authority_code_eia
early_release
reporting_frequency_code
data_maturity
(WE add this field in the extraction... getting dropped b/c of aggregations. See #1847)pudl.transform.eia923.generation()
combined_heat_power
plant_name_eia
operator_name
(probablyutility_name_eia
)operator_id
(probablyutility_id_eia
)plant_state
census_region
nerc_region
naics_code
early_release
pudl.transform.eia923.coalmine()
pudl.transform.eia923.fuel_receipts_costs()
plant_name_eia
plant_state
operator_name
(probablyutility_name_eia
)operator_id
(probablyutility_id_eia
)mine_id_msha
(should be dropped)mine_type_code
(should be dropped)state
(of the mine?)county_id_fips
(of the mine?)state_id_fips
(of the mine?)mine_name
(should be dropped)regulated
(mine or plant?)early_release