The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
In #3625 it seemed odd that there was no 2023 data showing up in the out_eia923__monthly_generation_fuel_by_generator table, even with 11 months of 2023 incremental_ytd records from the EIA-923:
gen_eia923_ms = pd.read_sql("out_eia923__monthly_generation", pudl_engine)
gen_eia923_ys = pd.read_sql("out_eia923__yearly_generation", pudl_engine)
gf_by_gen_eia923_ms = pd.read_sql("out_eia923__monthly_generation_fuel_by_generator", pudl_engine)
gf_by_gen_eia923_ys = pd.read_sql("out_eia923__yearly_generation_fuel_by_generator", pudl_engine)
frc_eia923 = pd.read_sql("out_eia923__monthly_fuel_receipts_costs", pudl_engine)
print(f"gen MS: {gen_eia923_ms.report_date.max()}")
print(f"gen YS: {gen_eia923_ys.report_date.max()}")
print(f"gen fuel by gen MS: {gf_by_gen_eia923_ms.report_date.max()}")
print(f"gen fuel by gen YS: {gf_by_gen_eia923_ys.report_date.max()}")
print(f"frc MS: {frc_eia923.report_date.max()}")
# gen MS: 2024-12-01 00:00:00
# gen YS: 2023-01-01 00:00:00
# gen fuel by gen MS: 2022-12-01 00:00:00
# gen fuel by gen YS: 2022-01-01 00:00:00
# frc MS: 2024-02-01 00:00:00
This seems a little bit fishy. We use pudl.output.eia923.drop_ytd_for_annual_tables() to avoid "annual" aggregations of data where we don't have a whole year of data, but here it seems like we're also somehow excluding monthly year to date records, which I don't think is intentional? And drop_ytd_for_annual_tables() does not get called when freq=="MS"
Investigate why this truncation is happening, and evaluate whether that's the expected / desired behavior.
Possible explanation
The out_eia923__monthly_generation_fuel_by_generator table depends on the fuel & generation allocation process, which depends on the boiler generator association table, and that table is only available from the annual EIA-860, not the monthly EIA-860M data, so the fact that we don't have the allocated generation & fuel table for periods in which there's only EIA-860M data right now makes sense.
If we wanted to hack it to give us some estimate of the most recent allocated data we could just forward fill the BGA table up to the most recent year, and it would be mostly right since these associations don't really change unless there's a major overhaul to a plant, but we're not doing that now.
In #3625 it seemed odd that there was no 2023 data showing up in the
out_eia923__monthly_generation_fuel_by_generator
table, even with 11 months of 2023incremental_ytd
records from the EIA-923:This seems a little bit fishy. We use
pudl.output.eia923.drop_ytd_for_annual_tables()
to avoid "annual" aggregations of data where we don't have a whole year of data, but here it seems like we're also somehow excluding monthly year to date records, which I don't think is intentional? Anddrop_ytd_for_annual_tables()
does not get called whenfreq=="MS"
Investigate why this truncation is happening, and evaluate whether that's the expected / desired behavior.
Possible explanation
The
out_eia923__monthly_generation_fuel_by_generator
table depends on the fuel & generation allocation process, which depends on the boiler generator association table, and that table is only available from the annual EIA-860, not the monthly EIA-860M data, so the fact that we don't have the allocated generation & fuel table for periods in which there's only EIA-860M data right now makes sense.If we wanted to hack it to give us some estimate of the most recent allocated data we could just forward fill the BGA table up to the most recent year, and it would be mostly right since these associations don't really change unless there's a major overhaul to a plant, but we're not doing that now.