aws-samples / dbt-glue

This repository contains the dbt-glue adapter
Apache License 2.0
96 stars 66 forks source link

merge_exclude_columns not working in iceberg incremental models #444

Open e-quili opened 1 week ago

e-quili commented 1 week ago

Describe the bug

The parameter merge_exclude_columns, which excludes the specified column from being updated with the merge strategy, seems to be ignored with iceberg tables.

Current behavior

The config below updates all the columns of the materialized table, including "created_at" column.

{{ 
    config(
        materialized='incremental',
        incremental_strategy='merge',
        unique_key='table_id',
        file_format= 'iceberg',
        merge_exclude_columns = ['created_at']
    )
}} 

Expected behavior

The setting merge_exclude_columns='created_at' should avoid updating the created_at column.

Screenshots and log output

We don't get any error message, even when we specify in merge_exclude_columns a column that doesn't exist.

System information

The output of dbt --version:

Core:
  - installed: 1.8.6
  - latest:    1.8.6 - Up to date!

Plugins:
  - glue:   1.8.1 - Up to date!
  - spark:  1.8.0 - Up to date!

The operating system you're using: Windows

The output of python --version: Python 3.12.6

Additional context

Same issue with merge_update_columns.

showell-nex commented 1 week ago

I've been having this problem too. I've dug through the code and the offending function is located here.

The iceberg upsert functionality in general needs building out but the actually buggy code is here whereby the current timestamp is added as an update timestamp BEFORE any comparisons are done between the source and the target data, causing the upsert string to always evaluate to not matched and insert new rows every time as the update timestamp column will always have a value in it that doesn't match any previously written rows.