Open dt-woods opened 3 months ago
@m-jamieson, why is 'Net Generation (Megawatthours)' scaled by 10 here?
That's a great question - don't know why it would be. What kind of results does it give? Should either be 0-1 or 0-100. If I had much to do with it, it would be 0-1, not really a fan of 0-100 percentage.
I guess the point is if it's giving right answers, not sure I would change it. I'm also not sure if efficiency is used. I think the filters are based on heat rate (btu/kwh), so maybe it's also not strictly needed ( although maybe helpful for anyone looking at the data).
While we're fixing that it would probably also be nice to change the hard coded but to Mwh conversion that's in the same equation.
The "efficiency" that's calculated is used as a filter on efficiency, which a model configuration setting (egrid_facility_efficiency_filters
: upper and lower bounds, which are set to 10 and 100, respectively.
The results of efficiency (based on the latest push linked above) are limited to a minimum of zero (no negatives, no NaNs, no Infs). There is no maximum bound due to the nature of the calculation:
$$ \frac{\mathrm{Net\ generation\ (MWh)} \times 10}{\mathrm{fuel\ consumption\ (MMBtu)}\times \frac{3.412\ \mathrm{MMBtu}}{1\ \mathrm{MWh}}} \times 100 $$
The maximum are numbers in the thousands and tens of thousands of percent.
In [27]: gen_efficiency['efficiency'].describe()
Out[27]:
count 1.641200e+04
mean 2.270501e+02
std 1.349595e+04
min 0.000000e+00
25% 2.395281e+01
50% 4.269409e+01
75% 8.589776e+01
max 1.215349e+06
Name: efficiency, dtype: float64
Based on EIA923_Schedules_2_3_4_5_M_12_2022_Final.xlsx, page 7 "File Layout" for the two quantities used in the efficiency calculation:
$$ \frac{\mathrm{Net\ generation\ (MWh)} \times 10}{\mathrm{fuel\ consumption\ (MMBtu)}\times \frac{3.412\ \mathrm{MMBtu}}{1\ \mathrm{MWh}}} \times 100 $$
Notice that the unit conversion is wrong in the current setup :worried:
After removing the 10x scaling factor and using the correct unit conversion from MMBtu to MWh:
In [21]: gen_efficiency['efficiency'].describe()
Out[21]:
count 1.641200e+04
mean 2.643369e+02
std 1.571230e+04
min 0.000000e+00
25% 2.788641e+01
50% 4.970543e+01
75% 1.000041e+02
max 1.414937e+06
Name: efficiency, dtype: float64
The quantiles look better. Still some super huge numbers.
I don't see any references to heat rate; however, if you want efficiency is units of Btu/kWh, that would make sense where the 10 scaling factor comes from (the millions cancel from MMBtu/MWh leaving you with Btu/Wh so multiply by 1/10 to get Btu/kWh). The calculation is inverted (kWh/Btu).
Don't worry about modifying anything to use the heat rate - I'm happy with using the percentage. I'm sure it was a mix of all the things above - perhaps previously filtering by heat rate and switching to percentage.
I'm still not really believing the efficiencies that are being calculated, as summarized in your comment above. I just did it by hand for 2020 (EIA923, pivot by plant ID and state, sum of net generation and total fuel consumption mmbtu). I get the following:
Statistic | Efficiency |
---|---|
Mean | 40.4% |
Min | -6,369% |
10% | 22.2% |
25% | 33.2% |
50% | 38.9% |
75% | 38.9% |
90% | 38.9% |
Max | 36,729% |
There's a strong peak around the 39% region because hydro, solar, wind (and probably more) use the same "efficiency" to calculate the "heat input". I guess further, my pivot only results in 10,468 entries. I get similar results for 2016 except the 37% is the efficiency for renewables. See the histogram for 2020 below.
Re-ran for 2020 EIA 923:
In [14]: gen_efficiency.describe()
Out[14]:
Total Fuel Consumption MMBtu Net Generation (Megawatthours) efficiency
count 9.154000e+03 9.154000e+03 9154.000000
mean 3.737781e+06 4.056320e+05 42.601295
std 1.618635e+07 1.679260e+06 516.474022
min 0.000000e+00 -8.454800e+05 0.000000
25% 2.319000e+04 2.514500e+03 34.901318
50% 8.148050e+04 9.108000e+03 38.920222
75% 9.935818e+05 1.019032e+05 38.920410
max 3.295960e+08 3.155243e+07 36728.903669
Results appear the same (or rather similar) to yours, Matt.
Then again for 2021:
In [20]: gen_efficiency.describe()
Out[20]:
Total Fuel Consumption MMBtu Net Generation (Megawatthours) efficiency
count 9.711000e+03 9.711000e+03 9711.000000
mean 3.641924e+06 3.934643e+05 45.273045
std 1.606625e+07 1.656508e+06 577.061684
min 0.000000e+00 -7.503480e+05 0.000000
25% 2.414850e+04 2.619000e+03 38.068001
50% 8.209200e+04 9.124000e+03 38.581376
75% 9.694300e+05 1.019680e+05 38.581573
max 3.298714e+08 3.162986e+07 37253.874224
Similar to 2020.
And again for 2022.
In [23]: gen_efficiency.describe()
Out[23]:
Total Fuel Consumption MMBtu Net Generation (Megawatthours) efficiency
count 1.032100e+04 1.032100e+04 1.032100e+04
mean 2.292835e+06 3.817568e+05 2.353918e+02
std 1.008945e+07 1.612123e+06 1.400607e+04
min 0.000000e+00 -1.121756e+06 0.000000e+00
25% 1.025700e+04 2.759000e+03 4.434713e+01
50% 3.184800e+04 8.877000e+03 1.000032e+02
75% 4.389610e+05 9.959000e+04 1.000045e+02
max 2.137728e+08 3.194279e+07 1.414937e+06
The problem appears to be something in the 2022 EIA 923.
2022 efficiencies by fuel category:
In [32]: final_gen_df.groupby(by="FuelCategory")['efficiency'].agg(['count', 'min', 'mean', 'std', 'max'])
Out[32]:
count min mean std max
FuelCategory
BIOMASS 327 0.000000 27.377079 6.936788 8.061011e+01
COAL 164 7.228735 30.450537 3.811412 3.841347e+01
GAS 1154 0.000000 81.601967 1099.427468 2.772934e+04
GEOTHERMAL 60 100.002721 100.004127 0.000512 1.000062e+02
HYDRO 1344 0.000000 99.209471 57.400020 1.756620e+03
MIXED 590 0.000000 13.669213 22.168856 2.111309e+02
NUCLEAR 54 0.000000 28919.141072 193205.578104 1.414937e+06
OIL 564 0.000000 28.478144 25.958422 3.834597e+02
OTHF 103 0.000000 7.767254 26.896938 1.000054e+02
SOLAR 4714 0.000000 99.885302 2.350605 1.137381e+02
SOLARTHERMAL 9 74.325506 91.637267 11.480540 1.000042e+02
WIND 1238 83.198613 99.985083 0.484713 1.001167e+02
Nuclear seems to be a problem. Also, the renewables are much higher than 37% (e.g., >90%).
Compare this to the 2020 efficiencies by fuel category:
In [35]: final_gen_df.groupby(by="FuelCategory")['efficiency'].agg(['count', 'min', 'mean', 'std', 'max'])
Out[35]:
count min mean std max
FuelCategory
BIOMASS 349 0.000000 27.183248 6.025765 75.683414
COAL 184 7.495489 30.233958 3.770085 38.787972
GAS 1099 0.000000 98.903593 1488.364555 36728.903669
GEOTHERMAL 57 38.919772 38.920269 0.000097 38.920464
HYDRO 1351 0.000000 38.482296 15.811447 439.955311
MIXED 602 0.000000 13.903016 84.259012 2043.019803
NUCLEAR 57 32.489278 32.660164 0.024329 32.664691
OIL 554 0.000000 25.445353 12.877129 91.396651
OTHF 84 0.000000 3.706687 11.493393 38.920363
SOLAR 3648 14.627729 38.897192 0.475619 40.773906
SOLARTHERMAL 8 33.913723 37.011544 2.504891 38.920302
WIND 1161 38.188003 38.920573 0.042804 40.142843
So clearly 2022 renewables are all at ~90+% efficiency. Those were pretty meaningless to begin with, so I'm not terribly concerned about those. Something is way off on the gas for both 2020 and 2022 though.
Gas is very weird for both years. Manually checking 2022 EIA 923, I'm getting a mean of ~31% for gas plant efficiency. If I filter by sectors 1 and 2, as would be expected I end up with a mean of 36.8% for gas, which makes sense to me.
Maybe something strange going on when reading the EIA923 files?
I glanced at some gas efficiencies while looking at some of the interim data frames, and they seemed okay. As kind of an additional check, the percent of gas in the US generation mix seems to be about right, suggesting not many plants are being caught up in this filter. Regardless, I've made a commit with a hotfix of sorts to ignore the efficiencies for renewable + nuclear since the efficiencies for these technologies were never really meaningful in the first place.
The method,
calculate_plant_efficiency
in eia923_generation.py returns a data frame with duplicated strings in their respective columns (e.g., 'Plant Name', 'State', 'Reported Fuel Type Code'). This caused by the use of a single aggregation method, sum, used during the pandas.groupy call. The summation of string elements is to duplicate them (e.g,. "ALALAL" for three rows with "AL" for their state). This error propagates through the rest of the code.https://github.com/USEPA/ElectricityLCI/blob/2232c41f2cb4fd333ad59c8710aa55906e6a7ed3/electricitylci/eia923_generation.py#L333