aws-samples / dbt-glue

This repository contains the dbt-glue adapter
Apache License 2.0
101 stars 69 forks source link

Cannot use merge_update_columns or merge_exclude_columns #423

Closed Jeremynadal33 closed 2 months ago

Jeremynadal33 commented 3 months ago

Describe the bug

Related to issue 421 if you are using a partitioned table. When trying to use merge_exclude_columns or merge_update_columns the returned columns are quoted with "col_name" instead of `col_name` as should do in Spark and then resulting in an error.

Steps To Reproduce

Create an incremental model using merge_exclude_columns ( called test_excude_columns ) :

{{
    config(
        materialized='incremental',
        incremental_strategy='merge',
        unique_key=['id'],
        merge_exclude_columns=['inserted_at'],
        file_format='delta',
    )
}}

with incoming_data as (
    select 1 as id, {{ dbt.current_timestamp() }} as inserted_at, {{ dbt.current_timestamp() }} as modified_at
    union all
    select 2 as id, {{ dbt.current_timestamp() }} as inserted_at, {{ dbt.current_timestamp() }} as modified_at
)

select * from incoming_data

Run the following command two times (the first to create the model and the second time for having the incremental run) :

dbt run -s test_excude_columns

You should see an error like :

18:27:06  Glue adapter: Glue returned `error` for statement None for code SqlWrapper2.execute('''/* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "data_platform_preparation", "target_name": "local", "node_id": "model.data_platform_preparation.test_exclude_columns"} */

    -- back compat for old kwarg name

  merge into xxx.test_exclude_columns as DBT_INTERNAL_DEST
      using xxx.test_exclude_columns_tmp as DBT_INTERNAL_SOURCE
      on 
                  DBT_INTERNAL_SOURCE.id = DBT_INTERNAL_DEST.id
      when matched then update set

            "id" = DBT_INTERNAL_SOURCE."id",
            "modified_at" = DBT_INTERNAL_SOURCE."modified_at"

      when not matched then insert *
''', use_arrow=False, location='s3://xxx'), ParseException: 
Syntax error at or near '"id"'(line 25, pos 12)

== SQL ==
/* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "data_platform_preparation", "target_name": "local", "node_id": "model.data_platform_preparation.test_exclude_columns"} */

    -- back compat for old kwarg name

  merge into xxx.test_exclude_columns as DBT_INTERNAL_DEST
      using xxx.test_exclude_columns_tmp as DBT_INTERNAL_SOURCE
      on 
                  DBT_INTERNAL_SOURCE.id = DBT_INTERNAL_DEST.id
      when matched then update set
            "id" = DBT_INTERNAL_SOURCE."id",
------------^^^
            "modified_at" = DBT_INTERNAL_SOURCE."modified_at"

      when not matched then insert *

Expected behavior

We should have something similare but the merge should look like:

merge into xxx.test_exclude_columns as DBT_INTERNAL_DEST
      using xxx.test_exclude_columns_tmp as DBT_INTERNAL_SOURCE
      on 
                  DBT_INTERNAL_SOURCE.id = DBT_INTERNAL_DEST.id
      when matched then update set

            "id" = DBT_INTERNAL_SOURCE."id",
            "modified_at" = DBT_INTERNAL_SOURCE."modified_at"

      when not matched then insert *

Screenshots and log output

See above sections

System information

The output of dbt --version:

Core:
  - installed: 1.8.3
  - latest:    1.8.4 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - glue:  1.8.1 - Up to date!
  - spark: 1.8.0 - Up to date!

The operating system you're using: MacBook Pro 2,3 GHz Intel Core i9 8 cœurs

The output of python --version:

Python 3.12.4

Additional context

Add any other context about the problem here.