catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

Some monthly MCOE outputs become annual #1088

Closed zaneselvans closed 2 years ago

zaneselvans commented 2 years ago

Currently the fuel_cost, hr_by_unit, and hr_by_gen outputs from the MCOE process end up having about the same number of records regardless of whether the frequency of the outputs is annual or monthly which is... just wrong. Looking at the report_date field in the output dataframes the monthly outputs really are annual. This came up in doing the data validation checks for the v0.4.0 release #681. See this code for example:

pudl_monthly = pudl.output.pudltabl.PudlTabl(
    freq="MS",
    pudl_engine=pudl_engine,
    start_date = "2018-01-01",
    end_date="2019-12-31",
)
mcoe_monthly = pudl_monthly.mcoe()

print("fuel_cost:")
display(pudl_monthly.fuel_cost().report_date.value_counts())
print("\nhr_by_unit:")
display(pudl_monthly.hr_by_unit().report_date.value_counts())
print("\nhr_by_gen:")
display(pudl_monthly.hr_by_gen().report_date.value_counts())
print("\ncapacity_factor:")
display(pudl_monthly.capacity_factor().report_date.value_counts())

pudl_annual = pudl.output.pudltabl.PudlTabl(
    freq="AS",
    pudl_engine=pudl_engine,
    start_date = "2018-01-01",
    end_date="2019-12-31",
)
mcoe_annual = pudl_annual.mcoe()

print("fuel_cost:")
display(pudl_annual.fuel_cost().report_date.value_counts())
print("\nhr_by_unit:")
display(pudl_annual.hr_by_unit().report_date.value_counts())
print("\nhr_by_gen:")
display(pudl_annual.hr_by_gen().report_date.value_counts())
print("\ncapacity_factor:")
display(pudl_annual.capacity_factor().report_date.value_counts())

My first guess was that this might be related to the (for now) mandatory annual frequency of the net generation allocation process that @cmgosnell has been working on, but that's not involved here anywhere.

She then suggested that it might somehow be related to a very minor tweak I made to the pudl.helpers.merge_on_date_year() function, but reverting those changes results in exactly the same behavior.

Currently I'm at a loss, and am planning to add some defensive AssertionErrors into the MCOE calculation process that check whether the frequency of these dataframes matches the frequency of the pudl_out object that is creating them, which would be a good thing to have hanging out in the background anyway.

Tasks

zaneselvans commented 2 years ago

It turns out the main problem here was a normal merge that really needed to be a merge_asof() style merge. And then there are these other details that come up with the merge_asof() based solution... namely: