Error in calculate_plant_efficiency

dt-woods commented 3 months ago

The method, calculate_plant_efficiency in eia923_generation.py returns a data frame with duplicated strings in their respective columns (e.g., 'Plant Name', 'State', 'Reported Fuel Type Code'). This caused by the use of a single aggregation method, sum, used during the pandas.groupy call. The summation of string elements is to duplicate them (e.g,. "ALALAL" for three rows with "AL" for their state). This error propagates through the rest of the code.

https://github.com/USEPA/ElectricityLCI/blob/2232c41f2cb4fd333ad59c8710aa55906e6a7ed3/electricitylci/eia923_generation.py#L333

dt-woods commented 3 months ago

@m-jamieson, why is 'Net Generation (Megawatthours)' scaled by 10 here?

https://github.com/USEPA/ElectricityLCI/blob/2232c41f2cb4fd333ad59c8710aa55906e6a7ed3/electricitylci/eia923_generation.py#L336

m-jamieson commented 3 months ago

That's a great question - don't know why it would be. What kind of results does it give? Should either be 0-1 or 0-100. If I had much to do with it, it would be 0-1, not really a fan of 0-100 percentage.

I guess the point is if it's giving right answers, not sure I would change it. I'm also not sure if efficiency is used. I think the filters are based on heat rate (btu/kwh), so maybe it's also not strictly needed ( although maybe helpful for anyone looking at the data).

While we're fixing that it would probably also be nice to change the hard coded but to Mwh conversion that's in the same equation.

dt-woods commented 3 months ago

The "efficiency" that's calculated is used as a filter on efficiency, which a model configuration setting (egrid_facility_efficiency_filters: upper and lower bounds, which are set to 10 and 100, respectively.

The results of efficiency (based on the latest push linked above) are limited to a minimum of zero (no negatives, no NaNs, no Infs). There is no maximum bound due to the nature of the calculation:

$$ \frac{\mathrm{Net\ generation\ (MWh)} \times 10}{\mathrm{fuel\ consumption\ (MMBtu)}\times \frac{3.412\ \mathrm{MMBtu}}{1\ \mathrm{MWh}}} \times 100 $$

The maximum are numbers in the thousands and tens of thousands of percent.

In [27]: gen_efficiency['efficiency'].describe()
Out[27]: 
count    1.641200e+04
mean     2.270501e+02
std      1.349595e+04
min      0.000000e+00
25%      2.395281e+01
50%      4.269409e+01
75%      8.589776e+01
max      1.215349e+06
Name: efficiency, dtype: float64

dt-woods commented 3 months ago

Based on EIA923_Schedules_2_3_4_5_M_12_2022_Final.xlsx, page 7 "File Layout" for the two quantities used in the efficiency calculation:

NET GENERATION (megawatthours): Net generation, year to date in megawatthours (MWh).Numeric. This is total electrical output net of station service. In the case of combined heat and power plants, this value is intended to include internal consumption of electricity for the purposes of a production process, as well as power put on the grid.
TOTAL FUEL CONSUMPTION MMBTUS: Total consumption of fuel in MMBtus, year to date. Numeric Note: this is the total quantity consumed for both electricity and, in the case of combined heat and power plants, process steam production.

dt-woods commented 3 months ago

$$ \frac{\mathrm{Net\ generation\ (MWh)} \times 10}{\mathrm{fuel\ consumption\ (MMBtu)}\times \frac{3.412\ \mathrm{MMBtu}}{1\ \mathrm{MWh}}} \times 100 $$

Notice that the unit conversion is wrong in the current setup :worried:

dt-woods commented 3 months ago

After removing the 10x scaling factor and using the correct unit conversion from MMBtu to MWh:

In [21]: gen_efficiency['efficiency'].describe()
Out[21]: 
count    1.641200e+04
mean     2.643369e+02
std      1.571230e+04
min      0.000000e+00
25%      2.788641e+01
50%      4.970543e+01
75%      1.000041e+02
max      1.414937e+06
Name: efficiency, dtype: float64

The quantiles look better. Still some super huge numbers.

dt-woods commented 3 months ago

I don't see any references to heat rate; however, if you want efficiency is units of Btu/kWh, that would make sense where the 10 scaling factor comes from (the millions cancel from MMBtu/MWh leaving you with Btu/Wh so multiply by 1/10 to get Btu/kWh). The calculation is inverted (kWh/Btu).

m-jamieson commented 2 months ago

Don't worry about modifying anything to use the heat rate - I'm happy with using the percentage. I'm sure it was a mix of all the things above - perhaps previously filtering by heat rate and switching to percentage.

I'm still not really believing the efficiencies that are being calculated, as summarized in your comment above. I just did it by hand for 2020 (EIA923, pivot by plant ID and state, sum of net generation and total fuel consumption mmbtu). I get the following:

Statistic	Efficiency
Mean	40.4%
Min	-6,369%
10%	22.2%
25%	33.2%
50%	38.9%
75%	38.9%
90%	38.9%
Max	36,729%

There's a strong peak around the 39% region because hydro, solar, wind (and probably more) use the same "efficiency" to calculate the "heat input". I guess further, my pivot only results in 10,468 entries. I get similar results for 2016 except the 37% is the efficiency for renewables. See the histogram for 2020 below.

dt-woods commented 2 months ago

Re-ran for 2020 EIA 923:

In [14]: gen_efficiency.describe()
Out[14]: 
       Total Fuel Consumption MMBtu  Net Generation (Megawatthours)    efficiency
count                  9.154000e+03                    9.154000e+03   9154.000000
mean                   3.737781e+06                    4.056320e+05     42.601295
std                    1.618635e+07                    1.679260e+06    516.474022
min                    0.000000e+00                   -8.454800e+05      0.000000
25%                    2.319000e+04                    2.514500e+03     34.901318
50%                    8.148050e+04                    9.108000e+03     38.920222
75%                    9.935818e+05                    1.019032e+05     38.920410
max                    3.295960e+08                    3.155243e+07  36728.903669

Results appear the same (or rather similar) to yours, Matt.

Then again for 2021:

In [20]: gen_efficiency.describe()
Out[20]: 
       Total Fuel Consumption MMBtu  Net Generation (Megawatthours)    efficiency
count                  9.711000e+03                    9.711000e+03   9711.000000
mean                   3.641924e+06                    3.934643e+05     45.273045
std                    1.606625e+07                    1.656508e+06    577.061684
min                    0.000000e+00                   -7.503480e+05      0.000000
25%                    2.414850e+04                    2.619000e+03     38.068001
50%                    8.209200e+04                    9.124000e+03     38.581376
75%                    9.694300e+05                    1.019680e+05     38.581573
max                    3.298714e+08                    3.162986e+07  37253.874224

Similar to 2020.

And again for 2022.

In [23]: gen_efficiency.describe()
Out[23]: 
       Total Fuel Consumption MMBtu  Net Generation (Megawatthours)    efficiency
count                  1.032100e+04                    1.032100e+04  1.032100e+04
mean                   2.292835e+06                    3.817568e+05  2.353918e+02
std                    1.008945e+07                    1.612123e+06  1.400607e+04
min                    0.000000e+00                   -1.121756e+06  0.000000e+00
25%                    1.025700e+04                    2.759000e+03  4.434713e+01
50%                    3.184800e+04                    8.877000e+03  1.000032e+02
75%                    4.389610e+05                    9.959000e+04  1.000045e+02
max                    2.137728e+08                    3.194279e+07  1.414937e+06

The problem appears to be something in the 2022 EIA 923.

dt-woods commented 2 months ago

2022 efficiencies by fuel category:

In [32]: final_gen_df.groupby(by="FuelCategory")['efficiency'].agg(['count', 'min', 'mean', 'std', 'max'])
Out[32]: 
              count         min          mean            std           max
FuelCategory                                                              
BIOMASS         327    0.000000     27.377079       6.936788  8.061011e+01
COAL            164    7.228735     30.450537       3.811412  3.841347e+01
GAS            1154    0.000000     81.601967    1099.427468  2.772934e+04
GEOTHERMAL       60  100.002721    100.004127       0.000512  1.000062e+02
HYDRO          1344    0.000000     99.209471      57.400020  1.756620e+03
MIXED           590    0.000000     13.669213      22.168856  2.111309e+02
NUCLEAR          54    0.000000  28919.141072  193205.578104  1.414937e+06
OIL             564    0.000000     28.478144      25.958422  3.834597e+02
OTHF            103    0.000000      7.767254      26.896938  1.000054e+02
SOLAR          4714    0.000000     99.885302       2.350605  1.137381e+02
SOLARTHERMAL      9   74.325506     91.637267      11.480540  1.000042e+02
WIND           1238   83.198613     99.985083       0.484713  1.001167e+02

Nuclear seems to be a problem. Also, the renewables are much higher than 37% (e.g., >90%).

Compare this to the 2020 efficiencies by fuel category:

In [35]: final_gen_df.groupby(by="FuelCategory")['efficiency'].agg(['count', 'min', 'mean', 'std', 'max'])
Out[35]: 
              count        min       mean          std           max
FuelCategory                                                        
BIOMASS         349   0.000000  27.183248     6.025765     75.683414
COAL            184   7.495489  30.233958     3.770085     38.787972
GAS            1099   0.000000  98.903593  1488.364555  36728.903669
GEOTHERMAL       57  38.919772  38.920269     0.000097     38.920464
HYDRO          1351   0.000000  38.482296    15.811447    439.955311
MIXED           602   0.000000  13.903016    84.259012   2043.019803
NUCLEAR          57  32.489278  32.660164     0.024329     32.664691
OIL             554   0.000000  25.445353    12.877129     91.396651
OTHF             84   0.000000   3.706687    11.493393     38.920363
SOLAR          3648  14.627729  38.897192     0.475619     40.773906
SOLARTHERMAL      8  33.913723  37.011544     2.504891     38.920302
WIND           1161  38.188003  38.920573     0.042804     40.142843

m-jamieson commented 2 months ago

So clearly 2022 renewables are all at ~90+% efficiency. Those were pretty meaningless to begin with, so I'm not terribly concerned about those. Something is way off on the gas for both 2020 and 2022 though.

Gas is very weird for both years. Manually checking 2022 EIA 923, I'm getting a mean of ~31% for gas plant efficiency. If I filter by sectors 1 and 2, as would be expected I end up with a mean of 36.8% for gas, which makes sense to me.

Maybe something strange going on when reading the EIA923 files?

m-jamieson commented 2 months ago

I glanced at some gas efficiencies while looking at some of the interim data frames, and they seemed okay. As kind of an additional check, the percent of gas in the US generation mix seems to be about right, suggesting not many plants are being caught up in this filter. Regardless, I've made a commit with a hotfix of sorts to ignore the efficiencies for renewable + nuclear since the efficiencies for these technologies were never really meaningful in the first place.

USEPA / ElectricityLCI

Error in calculate_plant_efficiency #247