catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

make the allocated net generation work at non-annual frequencies #1468

Closed cmgosnell closed 1 year ago

cmgosnell commented 2 years ago

The problem

The allocation process employs three tables: the generation fuel table, the original generation table and the generators table (accessed via pudl_out.gf_eia923(), pudl_out.gen_original_eia923() and pudl_out.gens_eia860() respectively). both the gf and the gen table is originally reported monthly while the gens table is annual. The allocation process doesn’t know to make the gens table monthly before trying to allocate. Given that, I think ensuring the allocation process knows how to deal with monthly data would be a relatively straightforward fix.

Seeing the problem

# instantiate a monthly pudl_out object
pudl_out_ms = pudl.output.pudltabl.PudlTabl(
    pudl_engine,
    freq='MS',
    fill_net_gen=True 
)
gen = pudl_out_ms.gen_eia923()
# the two interim ouptus
gen_original = pudl_out_ms.gen_original_eia923()
gen_allocated = pudl_out_ms.gen_allocated_eia923()

# make the annual version for comparison
pudl_out_as = pudl.output.pudltabl.PudlTabl(
    pudl_engine,
    freq='AS',
    fill_fuel_cost=True,
    roll_fuel_cost=True,
    fill_net_gen=True 
)
gen_as = pudl_out_as.gen_eia923()

# make some plots
gen_options = {
    "Monthly Allocated": gen_allocated,
    "Monthly Original": gen_original,
    "Annual Allocated": gen_as
}
for gen_type, df in gen_options.items():
    gen_sum = df.groupby(['report_date'])[['net_generation_mwh']].sum().sort_index()
    plt.plot(
        gen_sum.index, gen_sum.net_generation_mwh, '.--',
        label=gen_type
    )
plt.legend()
plt.title("Net Generation")
plt.show()

image.png

The solution??

I thiiiiink this could all be solved by using pudl.helpers.clean_merge_asof in pudl.analysis.allocate_net_gen.associate_generator_tables

cmgosnell commented 2 years ago

@zaneselvans would I need to employ some new homebrew solution for this because the higher frequency dataframe (the monthly generation table) is the less complete table? clean_merge_asof required the left df to be the higher frequency df and also effectively employs a left merge.

zaneselvans commented 2 years ago

Remember that we hated clean_merge_asof because it is slow AF and has some weird edge case behavior. I thought we wanted to do something simpler that just decomposes dates into year, month, and day columns, and then merges treating them like independent entities? Which I think would work in this case as well.

cmgosnell commented 2 years ago

Oh definitely this has never been a long-term solution. Maybe it was @TrentonBush who suggested a clean version that decomposed the dates?

TrentonBush commented 2 years ago

That rings a bell but 10 minutes of digging only surfaced this old issue #1106 that touched on generating timeseries on a cartesian product of ID fields

zaneselvans commented 1 year ago

@cmgosnell is this issue closed by #1608 from @grgmiller?