Closed zaneselvans closed 1 year ago
I'm curious if we could use the database schema to generate a lot of these de-normalized output tables. This might be a more complicated approach that I'm definitely not attached to, but it keep worming in my head.
If most of these de-normalized tables in the outputs right now are just merging additional tables into each core table for each of the outputs based on FK relationships (i.e. merge in plants_entities_eia
into the generators_eia860
table on plant_id_eia
for the gens_eia860
output), it seems like we could probably generate all of these merges based on the DB schema.
A complication I see here is which columns from each of these merged tables to actually keep. If we are planning on generating metadata resources
for each of these output tables, this problem could be pretty easily solved by enforce_schema
.
I've moved most of this content to #1973. @cmgosnell could you repost the comment above in #1973? Thanks!
Description
Currently we provide access to more human-readable denormalized outputs using software routines. This adds a layer of complexity and requires users to use Python. It's also kind of slow. Instead, for simple derived values and denormalized tables we can provide this type of output by defining views (stored queries) inside the databases we generate and distribute.
Motivation
In Scope
PudlTabl
output methods corresponding to individual data tables likefuel_receipts_costs_eia923
orplants_steam_ferc1
.PudlTabl
class.Out of Scope
Breaking API Changes
PudlTabl
class to access the database views directly rather than doing its own calculations and joins, but in the long run as we move to providing access via Intake catalogs or the DB directly, we will probably want to deprecate this access method.