Update all_plants_ferc1 `opex_nonfuel` column to feed from all tables

catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.

https://catalyst.coop/pudl

MIT License

468 stars 107 forks source link

Update all_plants_ferc1 `opex_nonfuel` column to feed from all tables #1619

Closed aesharpe closed 2 years ago

aesharpe commented 2 years ago

Discussion

Right now, the opex_nonfuel column only comes from the steam table. This is the column that the RMI Optimus formatted table relies on for the Total O&M Cost column. In order for this column to represent values from all the FERC tables we need to do some simple calculations.

Discussed in # 216 in the rmi_ferc1_eiarepo

The stated calculations are:

small gens: opex_nonfuel = opex_total - opex_fuel
hydro: opex_nonfuel = opex_total
pumped storage: opex_nonfuel = opex_total

Questions

This is technically a transformation, should this live in the transform step for each of the individual tables or in the output layer where the ferc1 tables get combined? My vote is for the transform tables so that the column gets added to those individual tables as well. This will also entail updating all the metadata to include the new column in those output tables.

cmgosnell commented 2 years ago

I like this idea! The calculations sound pretty straightforward and clear to me. I think doing these calculations in the output layer is more in line with the overall pudl conventions of doing most of the (even minor) calculations post the normalized db tables.

(It'll also be easier to integrate/update w/o reloading the full db 😎 )

aesharpe commented 2 years ago

I think doing these calculations in the output layer is more in line with the overall pudl conventions of doing most of the (even minor) calculations post the normalized db tables.

That makes sense -- it is kind of like a teeny tiny analysis layer. In the fullness of time, however, I think it might make sense to put them in transform? I'm very ok doing it in the output layer for now though.

cmgosnell commented 2 years ago

Idk! I think we do a ton of these little calculations in the output layer right now (so much calculating of capacity_factor in pudl/output). We've always said the db tables were the tidy, cleaned tables and the outputs were where we do calculations. This may change slightly if we move our denomalized output tables into db views or otherwise publish them as db tables.... but even in that design configuration we has talked about keeping the normalized/"raw"-ish pudl tables separate from the de-normalized tables w/ simple calculations in them

aesharpe commented 2 years ago

We've always said the db tables were the tidy, cleaned tables and the outputs were where we do calculations.

That's fair! I can get behind that. I just couldn't remember if we added things like report_year or something in transform or output.