Open amychen1776 opened 2 years ago
My initial thoughts are that we could extend the adapter for each warehouse to add an upload function which is accessible through a
{% do adapter.upload_file(file_path, destination) %}
or similar method in a macro/run-operation. That function can then just call any API methods needed to load the data into the warehouse.
We've been busy reimplementing the package, opening it up to be compatible with more adapters. Resolving this involves implementing a redshift version of each warehouse dispatch-able macro defined in https://github.com/brooklyn-data/dbt_artifacts/tree/main/macros.
The Redshift macro part is relatively easy to implement. The problem that I run into is that the type_string()
logic in the source models (e.g. https://github.com/brooklyn-data/dbt_artifacts/blob/main/models/sources/models.sql#L7), which translates to cast(null as TEXT)
in Redshift, is creating varchar(1)
data types in the materialized tables in Redshift. This makes the insert
statements fail.
Any thoughts on a solution? E.g. change null
to a dummy string, specify a static length, allow the models to have different logic based on adapter?
Here are some quick tests using the create table
logic that dbt writes:
create table "dev"."dbt_bi_artifacts"."model_executions"
as (
with dummy_cte as (
select 1 as foo
)
select
cast('12345678' as VARCHAR) as col1,-- creates varchar(8)
cast(null as VARCHAR) as col2, -- creates varchar(1)
cast('12345678' as TEXT) as col3, -- creates varchar(8)
cast(null as TEXT) as col4, -- creates varchar(1)
cast(null as varchar(32)) as col5 -- creates varchar(32)
from dummy_cte
where 1 = 0
);
Field report on @brendan-cook-87 's fork. Works great for me !
Self hosted DBT
DBT + redshift version:
23:26:10 Running with dbt=1.7.4
23:26:11 Registered adapter: redshift=1.7.1
packages.yml:
packages:
- package: dbt-labs/dbt_utils
version: 1.1.1
- package: dbt-labs/audit_helper
version: 0.9.0
- package: dbt-labs/codegen
version: 0.12.1
- package: calogica/dbt_expectations
version: 0.10.1
- git: "https://github.com/brendan-cook-87/dbt_artifacts.git"
revision: "support-redshift"
tested seed, run, snapshot. No issues. Only tyre kicked the data in the tables but looks correct.
Thanks for the code !!!
Any updates on this topic? Is there a particular reason why Redshift is not supported?
It would be epic to have this package support Redshift. One of the things we will need to discuss is how we will recommend loading into Redshift (or S3 would be better for Redshift)