catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 105 forks source link

`pudl.analysis.allocate_gen_fuel` is dropping/adding data #3165

Open grgmiller opened 6 months ago

grgmiller commented 6 months ago

Describe the bug

As part of our OGE validation checks, we compare the fuel and emissions totals in the generation_fuel_by_generator_energy_source_monthly_eia923 table to the generation and fuel totals in generation_fuel_eia923, since these should match (i.e. the allocate_gen_fuel process should only allocate the data, but not add/drop data). This validation check returned the following warning:

image

It appears for a small number of plants that the allocate_gen_fuel pipeline is either adding or subtracting a large amount of generation or fuel to the totals. I haven't yet been able to trace why this might be happening.

Bug Severity

How badly is this bug affecting you?

To Reproduce

Compare the generation and fuel totals from EIA-923 to the data in the generation_fuel_by_generator_energy_source_monthly_eia923 table

Expected behavior

The total generation and fuel for each plant should be the same before and after allocate_gen_fuel

grgmiller commented 6 months ago

Digging into this further, I started looking into plant 54809, which the validation check shows missing some fuel consumption data, but not net generation data. This plant has 6 IC petroleum generators, and one ST natural gas generator (which reportedly retired in September 2022).

It looks like the source of this error is that the EIA-923 generation fuel table reports the ST generator continuing to consume fuel in Sept-Dec, after it retired, even though there is no generation data reported for these months.

Strangely, this plant reports two different rows for each prime-mover/fuel combination, one with the "CHP plant" flag as yes, and one as no. In september, the fuel consumption data switches from CHP to non-CHP for the same prime mover/fuel.

In this case at least, perhaps dropping this fuel data is correct due to the retirement.

grgmiller commented 6 months ago

Ok it turns out that one issue was that when I was comparing inputs to outputs, I was comparing the outputs to generation_fuel_eia923 table, which I didn't realize excluded nuclear fuel data (which was the root cause of some of these inconsistencies). After switching to denorm_generation_fuel_combined_eia923, we get the following:

image

Plant 10613 is the plant that reports negative fuel consumption in May, so I am assuming this value is getting filtered out as anomalous.

Some of these other ones (58256, 59817) appear to have inconsistent prime mover codes between the generators table and the generation fuel table, which is resulting in some of this data getting dropped. I had thought we had implemented a check for this in this module. However, these plants are PV/BA plants with small amounts of generation, so the inconsistent generation/fuel totals should not be a big issue at this point.