Closed zaneselvans closed 5 years ago
@cmgosnell and I reviewed pudl.helpers
and pudl.analysis.analysis
and deleted a few things from helpers. We decided not to do a full review of pudl.analysis.analysis
because none of these functions are being used in the codebase -- only in notebooks. So we can safely remove the module from the packaging for v0.1.0 entirely, and deal with this for the next release.
fercplants
We need this to do the annual FERC1 to EIA923 plant mapping exercise. See issue #386. check_ferc1_tables
Another devtools function -- checks to make sure that the database schema extracted from FERC 1 for a reference year is compatible with all other years. Should be run each year when we integrate the new year of ferc1 data and start using that newest year as refyear.steam_ferc1_by_pudl
, fuel_ferc1_by_pudl
frc_by_pudl
, gen_fuel_by_pudl
These all do aggregation of some columns by plant_id_pudl and/or fuel type, and annualize the EIA tables. The annual output is already available via the output objects, and the rest of the aggregation seems straightforward. None of them are used elsewhere.generator_proportion_eia923
, capacity_proportion_eia923
, values_by_generator_eia923
Splitting some attributes in 923 out by generator rather than plant. Maybe interesting in the context of fossil-refi? Used in some of those old notebooks.primary_fuel_gf_eia923
, plant_fuel_proportions_gf_eia923
, primary_fuel_frc_eia923
, plant_fuel_proportions_frc_eia923
, primary_fuel_ferc1
, plant_fuel_proportions_ferc1
All of these functions are aimed at calculating the apparent fuel proportions and primary fuels of plants, as revealed by various reports of fuel delivery or consumption. The ferc1 version has been duplicated, I think, in the fbp (fuel by plant) infrastructure, which is part of the ferc1 output object infrastructure now. Seems like potentially useful/interesting stuff to integrate into some of the output tables somehow? Or to update as needed and integrate into tests?simple_select
(obsolete)simple_ferc_plant_ids
(unused, very simple)simple_eia_plant_ids
(unused, very simple)simple_pudl_plant_ids
(unused, very simple)ferc_eia_shared_plant_ids
(unused, very simple)ferc_pudl_plant_ids
(unused, very simple)eia_pudl_plant_ids
(only used in another obsolete function, very simple)yearly_sum_eia
(used by some of the old fossil-refi notebooks, but I think it's been replaced by the annual aggregation of EIA data in the output objects).consolidate_ferc1_expns
Part of our failed attempt at simply differentiating between production and non-production expenses. Too simple to actually work.ferc1_expns_corr
Calculates generation vs. expense correlations. Feeds into previous function. Again, this approach didn't end up being workable because of too much noise in the individual time series. Also it's not much code. Used in some old fossil-refi notebooksferc_expenses
Same as previous.Sorry this took me a while to get to...
Most of this looks right to me. I am a little hesitant to get rid of the ferc expense functions but logically you are probably correct.
iirc pretty much all of the maybe keep/look at functions were generated in service of a more granular MCOE calc and could be useful in the future. Some of these aggregations may even be useful in helping with the RMI plant mappings. I expect many of them need to be reworked or updated but we should keep them in mind before jumping in and building all new infrastructure. It might be worth revisiting them at the end of that project if it pans out.
Yeah, I know, it would be nice if the FERC expense / correlation stuff was useful but... it just doesn't seem like it is or will be. I think if/when we dive into that regression/data extension stuff we probably need to start from scratch. :-/
If you're okay with it, I will chuck everything in the "delete" bin, re-locate the 2 ferc1 infrastructure / id_mapping functions, and retain the "keep / look at" functions for now so that we can maybe use them as idea-templates for the RMI work, but mark them as #nocov / #noqa / untested.
This sounds like a good plan to me.
Done!
We have accumulated a fair amount of cruft (esp. in
pudl.analysis
) that is no longer useable, useful, or relevant to the project, and isn't going to be revived. We should delete this stuff before releasing. Anything that we do decide to keep around we should probably get into a test of some kind so the bitrot doesn't destroy it.Nothing in
pudl.analysis.analysis
is currently being called at any point in the tests, though some of it was certainly getting used in notebooks.