Closed dataders closed 1 year ago
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.
Describe the feature
I feel weird even saying this out loud, but after hearing this this package is @jtcohen6's "house in the country", I realized my schnapsidee just might belong after all. So here goes.
What if in the
external
config dict, there was amaterialized
key that defaults tofalse
, but if true would stage the external table temporarily only to call the following, then drop the external table?In this way,
ext_full_refresh
would drop the materialized table and reload it, something we'd do on a daily basis before callingdbt run
in our production pipeline.The benefit of extending (read: perverting) the elegance of
stage_external_sources
is that it ensures that downstream tables can be run and queried quickly while still keeping a link to the external source that can be called again when needed.However, brain is split because:
Describe alternatives you've considered
We're also considering either:
stage_external_sources
that could accomplish the same via something like:rename
,select * into
.Additional context
Both Azure Synapse and Azure SQL support the ideal of external tables in the more familiar sense of flat files in blob or s3 (though unfortunately, with different syntax... ). But, Azure SQL also supports cross database queries (a.k.a. "vertically partitioned elastic queries" a.k.a. external data sources of type
RDBMS
) which are basically the Azure equivalent of the classic SQL Server idea of "linked servers". Enter the idea of using dbt to build data marts.We're thinking a lot about the idea of marts that are downstream of our core data warehouse for the following reasons:
Who will this benefit?
This would obvs benefit our team, and also Azure stack folks. But it looks like Redshift also has a concept of cross-database queries?
And perhaps anyone else with a handful of the bullet points above?
Are you interested in contributing this feature?
@chaerinlee1 and I have toying around this the idea with #73 but it's still a work in progress. cc: @nandanhegde15