brooklyn-data / dbt_artifacts

A dbt package for modelling dbt metadata. https://brooklyn-data.github.io/dbt_artifacts
Apache License 2.0
333 stars 126 forks source link

Support Redshift #60

Open amychen1776 opened 2 years ago

amychen1776 commented 2 years ago

It would be epic to have this package support Redshift. One of the things we will need to discuss is how we will recommend loading into Redshift (or S3 would be better for Redshift)

NiallRees commented 2 years ago

My initial thoughts are that we could extend the adapter for each warehouse to add an upload function which is accessible through a {% do adapter.upload_file(file_path, destination) %} or similar method in a macro/run-operation. That function can then just call any API methods needed to load the data into the warehouse.

NiallRees commented 2 years ago

We've been busy reimplementing the package, opening it up to be compatible with more adapters. Resolving this involves implementing a redshift version of each warehouse dispatch-able macro defined in https://github.com/brooklyn-data/dbt_artifacts/tree/main/macros.

ataft commented 2 years ago

The Redshift macro part is relatively easy to implement. The problem that I run into is that the type_string() logic in the source models (e.g. https://github.com/brooklyn-data/dbt_artifacts/blob/main/models/sources/models.sql#L7), which translates to cast(null as TEXT) in Redshift, is creating varchar(1) data types in the materialized tables in Redshift. This makes the insert statements fail.

Any thoughts on a solution? E.g. change null to a dummy string, specify a static length, allow the models to have different logic based on adapter?

Here are some quick tests using the create table logic that dbt writes:

create table "dev"."dbt_bi_artifacts"."model_executions"
as (
    with dummy_cte as (
        select 1 as foo
    )
    select
        cast('12345678' as VARCHAR) as col1,-- creates varchar(8)
        cast(null as VARCHAR) as col2,      -- creates varchar(1)
        cast('12345678' as TEXT) as col3,   -- creates varchar(8)
        cast(null as TEXT) as col4,         -- creates varchar(1)
        cast(null as varchar(32)) as col5   -- creates varchar(32)
    from dummy_cte
    where 1 = 0
);
kylezeeuwen commented 9 months ago

Field report on @brendan-cook-87 's fork. Works great for me !

Self hosted DBT

DBT + redshift version:

23:26:10  Running with dbt=1.7.4
23:26:11  Registered adapter: redshift=1.7.1

packages.yml:

packages:
  - package: dbt-labs/dbt_utils
    version: 1.1.1
  - package: dbt-labs/audit_helper
    version: 0.9.0
  - package: dbt-labs/codegen
    version: 0.12.1
  - package: calogica/dbt_expectations
    version: 0.10.1
  - git: "https://github.com/brendan-cook-87/dbt_artifacts.git"
    revision: "support-redshift"

tested seed, run, snapshot. No issues. Only tyre kicked the data in the tables but looks correct.

Thanks for the code !!!

FredericoCoelhoNunes commented 9 months ago

Any updates on this topic? Is there a particular reason why Redshift is not supported?

hanslemm commented 1 month ago

Created an updated version of @brendan-cook-87 work here.

It works as expected for Redshift, so if you need support for Redshift, just tap this fork while the PR is not merged.