Closed aiss93 closed 4 weeks ago
Hi, thanks for your contribution.
Since this is breaking change, in case users are already using update_iceberg_ts
column, dbt-glue version upgrade impacts their workload.
Is it possible to think backward-compatible way? I usually prefer safer approach to add config to remove that column.
Hi @moomindani, I added a configuration parameter to control this behavior. By default, it will add the update_iceberg_ts column to ensure compatibility with existing workloads and avoid any disruptions.
Additionally, I discovered some hardcoded Spark configurations, such as:
{% call statement() %} set spark.sql.autoBroadcastJoinThreshold=-1 {% endcall %}
. This configuration will disable broadcasted joins by default, which can negatively impact performance in some cases.
I believe this should be removed, as such Spark configurations should be left for the user to set according to their specific use case and environment.
Yes, disabling spark.sql.autoBroadcastJoinThreshold
can introduce performance bottleneck. It will be better to override the config when spark.sql.autoBroadcastJoinThreshold
is set in --conf
parameter, or remove that option for future Glue version. But again, we need to be careful not to impact existing workload. And this needs to be discussed in a separate PR/issue.
Can we add test case for this PR?
resolves #452
Description
In the current implementation, when using the merge incremental strategy with the Iceberg file format, the adapter automatically adds a column update_iceberg_ts, which is not utilized in the merge process. This causes issues such as breaking schema comparisons between tables in certain scenarios, making the hard-coding of this column restrictive.
This PR removes the hard-coded update_iceberg_ts column from the adapter. If users require this column for specific use cases, they can now add it directly within their model configuration.
Checklist
CHANGELOG.md
and added information about my change to the "dbt-glue next" section.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.