catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 105 forks source link

manage retired generators in unit ID creatation #1149

Open cmgosnell opened 2 years ago

cmgosnell commented 2 years ago

@zaneselvans has made lots of progress making a more complete unit_id_pudl which connects collections of generators and (sometimes) boilers as a cohesive unit.

Something that came up for me while working on other things is that we should probably take into consideration how retired or proposed generators are clumped in with existing generators. Presumably, for any given you no retired generator should be grouped in a unit with an existing generator. But the unit id assignment is currently time agnostic - it generators units that should be able to be applied across any year of data. So if we wanted to take into consideration the operational status of generators in the unit id creation, it would probably be a significant change to the current process.

zaneselvans commented 2 years ago

We're currently requiring that Unit ID be a permanent, non time varying attribute in the BGA process as well.

If every Unit contains generators, then the table that contains / defines unit_id_pudl can be merged straightforwardly with the generators_eia860 table (where the Unit ID is currently getting dumped post-facto) via the output routines) and it should be easy to obtain a time varying description of which generators are part of what unit and whether they were active in a given year, by selecting from that table based on their operational status.

I think it's simpler and more broadly useful to keep "generator operational status" and "generator unit membership" separate in their most basic forms, and combine these two tables as needed (maybe with a convenience function) when we need to know what generators were active within a unit at a given time.