Closed cmgosnell closed 1 year ago
@zaneselvans would I need to employ some new homebrew solution for this because the higher frequency dataframe (the monthly generation table) is the less complete table? clean_merge_asof
required the left df to be the higher frequency df and also effectively employs a left merge.
Remember that we hated clean_merge_asof
because it is slow AF and has some weird edge case behavior. I thought we wanted to do something simpler that just decomposes dates into year, month, and day columns, and then merges treating them like independent entities? Which I think would work in this case as well.
Oh definitely this has never been a long-term solution. Maybe it was @TrentonBush who suggested a clean version that decomposed the dates?
That rings a bell but 10 minutes of digging only surfaced this old issue #1106 that touched on generating timeseries on a cartesian product of ID fields
@cmgosnell is this issue closed by #1608 from @grgmiller?
The problem
The allocation process employs three tables: the generation fuel table, the original generation table and the generators table (accessed via
pudl_out.gf_eia923()
,pudl_out.gen_original_eia923()
andpudl_out.gens_eia860()
respectively). both the gf and the gen table is originally reported monthly while the gens table is annual. The allocation process doesn’t know to make the gens table monthly before trying to allocate. Given that, I think ensuring the allocation process knows how to deal with monthly data would be a relatively straightforward fix.Seeing the problem
The solution??
I thiiiiink this could all be solved by using
pudl.helpers.clean_merge_asof
inpudl.analysis.allocate_net_gen.associate_generator_tables