catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
470 stars 108 forks source link

`pudl.output.plants_eia860` gets run... many times in the `pudl_out` compilation tables #2034

Open cmgosnell opened 1 year ago

cmgosnell commented 1 year ago

Some of our output tables are mostly just grab x table and denormalize it (merge entity x table into it), but sometimes we do imputations or filling in of data.

There are also two main ways we access the output tables, through the cached table stored within pudl_out._dfs or by re-regenerating the output table directly through the function in pudl.output. The plants_eia860 is being accessed in a few different places via the second method and for a while has had some imputations happening via pudl.output.eia860.fill_in_missing_ba_codes. Because of this, this imputation is being run over and over again.

I've noticed this mostly from re-running the net-gen/fuel allocation process in a notebook over and over and over again (a new pudl_out object will re-run this missing ba codes process 7 times for the mcoe output when the net generation/fuel is allocated). image

e-belfer commented 11 months ago

Closed by #1973? @cmgosnell