aws-samples / dbt-glue

This repository contains de dbt-glue adapter
Apache License 2.0
95 stars 61 forks source link

Glue DB output override per SQL config #68

Open pixie79 opened 2 years ago

pixie79 commented 2 years ago

I have been looking at overriding the output schema and database. Ideally we have 3 or 4 different glue databases. I have tried this within the following code, which allows me to override the S3 location but not the Glue DB:

{{ config( partition_by=['year','month','day','hour'], file_format='parquet', materialized='table', target_schema='bronze', target_database='bronze', custom_location=“s3://“ + env_var('AWS_ACCOUNT_ID') + “-bronze/data/general_ledger_posting_event", tags=["gl"] ) }}

Ideally I would like to be able to override them at a higher level in the dbt_project.yml models: projectname:

raw:

+tags:

- "raw"

+materialized: table

bronze:
  +file_format: hudi
  +tags:
    - "bronze"
  +materialized: table
pixie79 commented 2 years ago

With further testing it appears if I set "schema" in the SQL config to "bronze" but "schema" in the profile set to "raw" then the glue database is created in a db called "raw_bronze" instead of "bronze". It also then partially ignores the custom_location as the location referenced in glue becomes the default location from the profile (location: "s3://{{ env_var('AWS_ACCOUNT_ID') }}-raw") but then with a path of "raw_bronze" interjected. However, the data is actually stored correctly in S3 in the location given by the custom_location.

Setting database as an override in the config part of the sql file seems to have no effect?

It would appear this is nearly working as needed but with a small bug when schema is set, ideally if schema is set in the sql config then this should be the glue database that the table is created in and it should register the path as the one given by the custom location (if one is given, rather than the default). The output of the files is correct in s3.

jduggin-Parchment commented 1 year ago

Using the custom schema applies the behavior exactly as you describe by default, and appears to be intended behavior of dbt.
https://docs.getdbt.com/docs/build/custom-schemas#how-does-dbt-generate-a-models-schema-name

dbt provides a way of managing how this is generated by creating your own macro generate_schema_name in your project. https://docs.getdbt.com/docs/build/custom-schemas#changing-the-way-dbt-generates-a-schema-name