catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
488 stars 111 forks source link

Create notebook exploring bias in load growth projections #3910

Open zaneselvans opened 1 month ago

zaneselvans commented 1 month ago

Overview

Regulated utilities have a habit of overestimating load growth, in order to justify expanding their rate base. @arengel at RMI did a little exploration of this in 2017 in The Billion Dollar Costs of Forecasting Electricity Demand and it has become relevant once again with the rush to build gas plants and delay coal plant retirements in order to serve "hyperscale" data centers and AI training. To what extent are utilities simply taking advantage of the hype around this narrative to justify "emergency" build out of new fossil infrastructure? Data reported by planning areas in the FERC-714 can provide some context, and would also provide a nice example analysis notebook for our PUDL Examples repo.

Image

Outline

Questions

Background Reading

zaneselvans commented 3 weeks ago

Hey @nilaykumar just comment in here if you run into anything strange or have questions about the data or electricity system background (it looks like I can't assign the issue to you until you've engaged with it though)

nilaykumar commented 1 week ago

What are the formal definitions (according to form 714) of summer and winter? I looked through the form documentation but couldn't find an answer (maybe I missed it!).

The EIA's glossary defines summer as May through October and winter as November through April. I'll stick with this for the moment, but let me know if you're familiar with the precise definitions.

Edit: and would April 2025, for instance, still count as the winter of 2024?

Edit#2: Aha, I should have checked the data dictionary:

zaneselvans commented 1 week ago

I suspect that the column descriptions in the data dictionary ultimately came from the EIA column definitions even though they're in a FERC table. Unfortunately the FERC-714 instructions are totally vague on the summer/winter definition, which could mean that every respondent is applying their own criteria and it's not standardized.

This EIA post from 2020 give a little insight into how peaks vary by region, month, and hour. The "summer peaking" pattern is a single daily peak late in the afternoon for AC, while the "winter peaking" pattern is two (smaller) daily peaks, in the morning and evening for heating. And when the load curve shifts between these two patterns is different in different regions. E.g. the US Southwest has a "summer" style pattern in all of April, July, and October, and only has the winter pattern in January, but the Northeast has a winter pattern in all of October, January, and April, and only looks like "summer" in July. So maybe it's not unreasonable that different respondents in different climatic regions can choose different cutoffs?

image

For the purposes of this visualization / analysis probably it doesn't matter too much -- if we can just make the cutoff dates for summer/winter a parameter that goes into it we can tweak it later if need be. And looking at all those regional curves, the "winter" peaking demand pattern is always highest in January while the "summer" peaking pattern is always highest in July, so windows that exclude the shoulder seasons are probably fine. It's probably simpler initially to just do a global peak rather than calling out the summer and winter peaks separately. It looks like the RMI analysis didn't differentiate.

Are you actually seeing winter peaks that happen in December for some respondents?

nilaykumar commented 1 week ago

Thanks for the detailed explanation! This question about varying summer/winter designations by geo is an interesting one, but agreed -- it makes sense to stick with a simple global peak for now.

I am actually seeing peaks throughout the year, but I might be wrangling the data incorrectly. My notebook is here and I believe it should be visible (I'm new to Kaggle, so let me know if there's something missing). I've got a simple histogram of the 10-year-forecast-vs-realized over-forecast percentage at the bottom. Hopefully that looks reasonable.

zaneselvans commented 1 week ago

Yes, I can access the notebook!

I'm suspicious of the relatively flat distribution of months in which peak demand occurred. I would expect it to be primarily centered around a summer peak in July or August, with a smaller set of planning areas (if any) peaking in ~January.

peak_month
8     18722
7     18182
6     17323
1     16075
5     15973
12    15964
3     15864
10    15634
4     15503
9     15495
11    15176
2     14254
Name: count, dtype: int64

One thing that might be happening is that the planning areas reporting FERC-714 vary wildly in size, and the smaller ones probably have a much more variable pattern of demand. You might try just looking at planning areas above a certain total demand threshold? The out_ferc714__summarized_demand table has some annual summary statistics for the FERC-714 respondents in it that you can use to identify a set of respondent IDs associated with larger demand that you can focus on.

Also the number of peak values being reported in peak_month seems very large. There are only ~200 respondents, and 18 years of data, so there should be a total of ~3600 instances of peak annual demand -- if the peak is unique, which I guess it won't be, but should it really be as non-unique as it appears to be here? It looks like there's ~200,000 instances of actual demand matching peak demand.

It might be good to spot check a couple of big regional respondents and make sure they look reasonable. E.g. the California ISO and ERCOT should both have a clear summer peak. Maybe aggregate to the max value per day and plot those curves to see what the seasonal patterns look like for various planning areas.

zaneselvans commented 1 week ago

I think the histogram looks generally like what I would expect. More or less centered around 0, but with a right-skew.

Given the wide range of total demand in the different planning areas, it might be more informative to do a histogram that's weighted either by peak demand or total demand.

nilaykumar commented 1 week ago

Nice catch, I had a lot of duplicates there from demand numbers that were either identically zero or hitting the yearly peak quite often (e.g. during the summer). Dropping duplicates appropriately seems to give more reasonable results (though there are still a decent number of peaks in December, for example):

peak_month
7     601
8     574
1     444
6     207
12    124
2      90
9      80
11     54
3      19
4      13
5      10
10     10
Name: count, dtype: int64

I've started to sketch out some plots similar to the RMI plot. I'm not familiar enough yet with the data to have much confidence in them yet (the peak-weighted curve is all over the place), but getting there!

zaneselvans commented 1 week ago

Okay, that distribution looks much more like I would expect -- almost all the peaks are in clear summer or winter months and not the shoulder seasons.

zaneselvans commented 2 days ago

Annnnd now Georgia Power has an even more bonkers load projection that would triple its overall generating capacity by 2030, almost entirely driven by datacenter loads. Just deranged fantasy. The IRP will be 🍿🔥