dbt-labs / dbt-redshift

dbt-redshift contains all of the code enabling dbt to work with Amazon Redshift
https://getdbt.com
Apache License 2.0
95 stars 58 forks source link

[CT-1118] Materalize as Redshift external tables (Glue tables) #175

Closed Ans-hrf closed 2 years ago

Ans-hrf commented 2 years ago

Describe the feature

A clear and concise description of what you want to happen. With DBT we have only Table and View materialization which is fine for most of the engines. With Redshift Spectrum, it would be useful to be able to directly materialize models as external table to get a better integration of DBT with the lakehouse capabilities of Redshift.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered. A workaround solution i use currently is by materializing the model as view and then use a post hook to create an external table from the view :

{{ config(
   alias='exter',
   materialized='view',
   bind=False,
   post_hook=[{"sql": "{{ insert_to_external_table(
       this.schema,
       this.table,
       'glue_catalog_test',
       'crwiris',
       ) }}",
   "transaction": False},
   after_commit("drop view dev.dbt.exter")]
   )
}}

select * FROM "dev"."glue_catalog_test"."crwiris"

Additional context

Please include any other relevant context here.

Who will this benefit?

What kind of use case will this feature be useful for? Please be specific and provide examples, this will help us prioritize properly. Lakehouse approach has become attractive since a while and Redshift (with Spectrum) is a major player for handling lake and warehouse data with the same engine and query formats. Many companies would benefit from the ability of DBT to handle Glue tables through Redshift not only for reading but for creating(already possible through Macros) and inserting seamlessly.

Are you interested in contributing this feature?

Let us know if you want to write some code, and how we can help.

mp24-git commented 2 years ago

Did you try with using the existing external table and the incremental materialization (without unique_key)? This just inserts all data from the model into the target table if it exists.

sanga8 commented 2 years ago

mp24-git commented 6 days ago

Did you try with using the existing external table and the incremental materialization (without unique_key)? This just inserts all data from the model into the target table if it exists.

If I'm not mistaken, the goal here would be to create external table which does not exist yet.

dbeatty10 commented 2 years ago

Thanks for reaching out about external table support @Ans-hrf !

Have you tried the dbt_external_tables package (view on GitHub)?

sanga8 commented 2 years ago

The issue with dbt_external_tables is that it works for sourcing external tables, however we'd like to have a SQL model materialized as external table.

I found the similar issue here: https://github.com/dbt-labs/dbt-redshift/issues/24

Ans-hrf commented 2 years ago

Hi @dbeatty10 @mp24-git and @sanga8 !

Yes indeed, as @sanga8 mentioned, the purpose of my demand is to have a simple process to materialize data in external tables with Redshift. dbt_external_tables allows to create external tables (using sources) but when we want to save dbt models in this tables to the best of my knowledge we don't have a simple way (for example materialize as external_table) to save model result not as a Redshift internal table/view but as an external table in Glue catalog.

In the workaround i propose, i save the model as a view with DBT then i run a post-hook to put the content of the view in an external table leveraging spectrum.

I hope the demand is clearer now, don't hesitate to challenge if not yet clear.

Thanks

Fleid commented 2 years ago

Hi all, I'm closing this ticket and re-opening #24, as a lot more context is available there. Please let me know if you think this is not a duplicate and should deserve to stay open on its own - happy to reconsider. Thanks!