Closed zaneselvans closed 10 months ago
Extremely low (11.5%) coverage of unit_id_pudl
and partial (~70%) coverage of fuel prices means that only a small fraction of all heat rates (10.1%) and fuel costs (6.5%) can be estimated at all. Has it always been this bad?
In the MCOE output we have far better coverage of net_generation_mwh
than for total_mmbtu
which I think is the equivalent value on the fuel side. Now that we have allocated estimates for both net generation and fuel consumption, should we be using those estimates in tandem in this analysis to estimate heat rates by generator-month? What's the difference between these two estimates of total fuel consumption at the generator level?
data_cols = [
"unit_id_pudl",
"capacity_factor",
"fuel_cost_per_mmbtu",
"fuel_cost_per_mwh",
"heat_rate_mmbtu_mwh",
"net_generation_mwh",
"total_fuel_cost",
"total_mmbtu",
]
dude = (
mcoe_monthly[["report_date"] + data_cols]
.groupby("report_date")
.apply(lambda x: x.isna().sum() / len(x))
)
for col in data_cols:
plt.plot(dude[col], label=col)
plt.ylabel("Missigness")
plt.legend()
Using the current allocations of fuel consumption and net generation instead, we get much better coverage. And we're already using the allocated net generation side of things, so why would we not also want to use the per-generator fuel allocations? Do we not trust them as much?
gf_by_gen_monthly = defs.load_asset_value(AssetKey("generation_fuel_by_generator_monthly_eia923"))
gf_by_gen_monthly["heat_rate_mmbtu_mwh"] = gf_by_gen_monthly.fuel_consumed_for_electricity_mmbtu / gf_by_gen_monthly.net_generation_mwh
data_cols = [
"unit_id_pudl",
"heat_rate_mmbtu_mwh",
"net_generation_mwh",
"fuel_consumed_for_electricity_mmbtu",
]
dude = (
gf_by_gen_monthly[["report_date"] + data_cols]
.groupby("report_date")
.apply(lambda x: x.isna().sum() / len(x))
)
for col in data_cols:
plt.plot(dude[col], label=col)
plt.ylabel("Missignness")
plt.legend()
Given that we're calculating all these metrics by generator and timestep and then joining them all together, it seems like maybe we should just load the final MCOE table into the database, rather than all of the intermediary assets like capacity_factor
and fuel_cost
and hr_by_gen
. Are there places outside of the MCOE calculation that these intermediary values are being accessed / calculated directly? I guess we'll find out.
For the FERC to EIA analysis:
Where to go from here after chatting with Christina:
After running the data validation and full integration tests:
all_gens
argument which doesn't do anything, so the plant_parts_eia
tests should still fail (or behave strangely).gens_cols
and all_gens
functionality that is needed in the context of the FERC to EIA / Plant Parts downstream need to be implemented downstream of MCOE.