Open cmgosnell opened 1 year ago
I did some debugging sleuthing and found that the unit_ids that were trying to be merged into gens_df in _append_masked_units had multiple proposed units for a small number of generators due to having multiple PMs over time
@cmgosnell is the problem here that these should be merged into the non-static gens table vs. the entity generators table? Or is the problem that unit ids should not vary by generator at all?
When the unit id code was written, we expected PM's to be static. We have since transitioned PM's to be an annually varying generator attribute. Which just breaks the expectations built into this whole step.
I honestly don't know what the answer is here. It is possible that we have to enable the unit id to be varied annually. Or we could ignore these PM-time-varying generators in this process bc >95% of them don't vary. Or we need to re-jigger the unit id code to enable time-varying PM codes while keeping unit ids static.
If this were me doing this, I'd take a minute to go do a survey of a handful of time-varying PM plants, convert those plants into little unit tests & start building an understanding of which of the various endpoints we should be shooting for.
Describe the bug
A while back @zaneselvans developed another step in our pursuit of complete coverage of
unit_id_pudl
#1037. We didn't actually turn it on in our tests so we didn't notice it is not compatible with the change we made to how we are normalizing theprime_mover_code
- we determined that the PM code should actually be annually varying so it now gets harvested into the annually varyinggenerators_eia860
table instead of the staticgenerators_entity_eia
table.This change is incompatible with the way the current
pudl.output.eia860.fill_unit_ids
works during theassign_single_gen_unit_ids
step. This could very well effects other stages of the unit id assignment as well!To Reproduce
Steps to reproduce the behavior -- ideally including a code snippet that causes the error to appear.
settings.yml
file you're using to specify which data to load, and make a note of where in the ETL process the error is happening.With an all-year pudl.sqlite:
Error:
I did some debugging sleuthing and found that the
unit_ids
that were trying to be merged intogens_df
in_append_masked_units
had multiple proposed units for a small number of generators due to having multiple PMs over time:Expected behavior
Preferably, we could run this function to make lotsa unit_id_pudl's 😎 This will be helpful in a number of ways but in particular for making the
subplant_id
to glue epacamd and eia (see #2491)