catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 105 forks source link

Clean up small gen capacity factors #1083

Open aesharpe opened 2 years ago

aesharpe commented 2 years ago

image.png

zaneselvans commented 2 years ago

Here's the code for the above plots:

plants_small_ferc1 = pd.read_sql("plants_small_ferc1", pudl_engine)
cf = plants_small_ferc1.net_generation_mwh / (plants_small_ferc1.capacity_mw * 8760)

plt.hist(cf[np.isfinite(cf)], bins=100, range=(0,1.25e3), log=True)
plt.xlabel("Capacity Factor")
plt.ylabel("Number of Records");
plt.title("Annual Capacity Factors of FERC Form 1 Small Plants")
plt.show();

plt.hist(cf[np.isfinite(cf)], bins=100, range=(0,1.25), log=True)
plt.xlabel("Capacity Factor")
plt.ylabel("Number of Records");
plt.title("Annual Capacity Factors of FERC Form 1 Small Plants")
plt.show();

plt.hist(cf[np.isfinite(cf)], bins=100, range=(0,1.25e-3), log=True)
plt.xlabel("Capacity Factor")
plt.ylabel("Number of Records");
plt.title("Annual Capacity Factors of FERC Form 1 Small Plants")
plt.show();

Not sure whether the unit errors are in kW vs. MW reporting of capacity_mw or in the kWh vs. MWh reporting of net_generation_mwh. Unfortunately it's probably a mix of both 😭

cmgosnell commented 2 years ago

I don't understand how the first two graphs here can have the same scale on the y-axis and have such wildly different capacity factors. Shouldn't the first graph include everything from the second but in just the first bar? And thus shouldn't that first bar be larger than it is? Is this just a log thing and truly everything in the first bar in graph 1 includes everything in graph 2?

aesharpe commented 1 year ago

This will be worth addressing in #1735

zaneselvans commented 1 year ago

@cmgosnell It's a logarithmic scale vertically, so the small difference in the height of the first bar is really more like a factor of 5-8x. And you're right that the whole second plot is coming from just the 1st bin of the first plot. We'll need another source of information to be able to resolve the MW/kW vs. MWh/kWh ambiguity.