Closed bendnorman closed 5 months ago
I don't know if we want to deal with it now, but there are also a bunch of columns that do not conform to our column naming conventions that I think we should rename at some point. E.g. the rolling
averages for CapEx etc in the FERC 1.
I think the place to start would be going through fields.py
to ensure that there aren't any obvious wierdos. Identifying any fields
that appear in that module which aren't part of any resource and purging them might be a good idea.
The Census DP1 naming isn't great. It really contains state, county, and tract data (in 3 different tables / assets). If we're renaming things it seems like now would be a good time to shift it to just census dp1?
Should the exploded assets be a part of the core of our output layer?
definitely output! they are so weird and modified.
Should the exploded assets be written to the database?
yes (probably)! but we've been kinda waiting to publish them. Right now the main remaining step in the way of publishing them is actually finalizing a fourth rate base table that is much more explainable and user facing. I want to wait to actually publish the current three exploded tables until we have this rate base table because really most users should never use the exploded tables and we can point them to the rate base tables. I believe rmi-ers will want access to the three exploded tables.
What should the naming convention be for these exploded assets?
out_ferc1__yearly_exploded_{core seed table base name}
:
out_ferc1__yearly_exploded_balance_sheet_assets
out_ferc1__yearly_exploded_balance_sheet_liabilities
out_ferc1__yearly_exploded_income_statement
Notes: i think the "core seed table base name" here should not include the schedule number because these tables include multiple tables with multiple schedule numbers.
hm but maybe exploded
is too... idk not informative.. maybe it should be _detailed
or _granular_break_down
(omigosh that's long) or something? the gist is it takes a super high level data point like total assets and converts that into as much detail as is possible via the calculations/connections between records that is embedded in the FERC data. We've been calling it explosion bc it explodes a small set of info into a large, more granular view. But that might be too colloquial.
I like detailed
or granular
! IDK if those are too vague but probably more informative than exploded
.
I think i like detailed!
out_ferc1__yearly_detailed_balance_sheet_assets
out_ferc1__yearly_detailed_balance_sheet_liabilities
out_ferc1__yearly_detailed_income_statement
also i don't love the bare dp1
for the census... mostly bc idk what that means. i can look it up but it doesn't tell me anything as is. how about raw_census__demographics
or raw_census__demographics_dp1
in the same spirit of adding FERC1 schedules to the end of those asset names.
also does it matter that this asset is not a table? the sql part of it does communicate something that i personally find helpful bc without that it looks like it is just a table like the majority of the assets.
I agree a more descriptive name would be helpful. How about raw_census__demographic_profiles_dp1
?
I'm not sure I understand your question. Are you asking why raw_census__dp1
is not written to the database?
I also like detailed
! One issue is that there are ~130 references to "exploded" in the code. Should we create a separate PR for renaming all instances of exploded
to detailed
?
I like raw_census__demographic_profiles_dp1
!
I'm not sure I understand your question. Are you asking why raw_census__dp1 is not written to the database?
I mean this isn't actually a table its some weird reference to a sqlite db right?
and i personally think its fine if we keep calling it explode or explosion in the code. especially as a verb i think it makes sense. but very open to converting away from the explosion language
Oh I see. The raw_census__demographic_profiles_dp1
asset returns a path to the database. We have some other assets that don't return dataframes. I think to be consistent we should leave the storage method out of the asset name for now.
Would you be ok with having the tables use detailed
but the code use exploded
for a bit?
Oofta! I thought #3029 was the last renaming PR but it turns out there are a few straggler assets that haven't been renamed.
None of these assets are currently saved to the database. Given no one currently expects these to be in the database, I propose we hold off on renaming these assets so we can merge #2818 and start doing data-only releases ASAP.
Questions
state_average_fuel_costs_eia
asset?~For @bendnorman: I did some exploded asset renaming in https://github.com/catalyst-cooperative/pudl/pull/2914