catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
471 stars 108 forks source link

Add Database Views #1178

Closed zaneselvans closed 1 year ago

zaneselvans commented 3 years ago

Description

Currently we provide access to more human-readable denormalized outputs using software routines. This adds a layer of complexity and requires users to use Python. It's also kind of slow. Instead, for simple derived values and denormalized tables we can provide this type of output by defining views (stored queries) inside the databases we generate and distribute.

Motivation

In Scope

Out of Scope

Breaking API Changes

cmgosnell commented 1 year ago

I'm curious if we could use the database schema to generate a lot of these de-normalized output tables. This might be a more complicated approach that I'm definitely not attached to, but it keep worming in my head.

If most of these de-normalized tables in the outputs right now are just merging additional tables into each core table for each of the outputs based on FK relationships (i.e. merge in plants_entities_eia into the generators_eia860 table on plant_id_eia for the gens_eia860 output), it seems like we could probably generate all of these merges based on the DB schema.

A complication I see here is which columns from each of these merged tables to actually keep. If we are planning on generating metadata resources for each of these output tables, this problem could be pretty easily solved by enforce_schema.

bendnorman commented 1 year ago

I've moved most of this content to #1973. @cmgosnell could you repost the comment above in #1973? Thanks!