It will allow the "ignore", "fail" and "append_new_columns" as on_schema_change strategy but not the "sync_all_columns" because the macro spark__alter_relation_add_remove_columns does not allow for dropping columns. It could because Delta allow for dropping columns if table properties are :
WARNING : there is still a limitation. For example, if using "ignore" strategy and removing columns, update will fail because spark__get_merge_sql uses update set * which looks for all columns.
[x] I have run this code in development and it appears to resolve the stated issue
[x] This PR includes tests, or tests are not required/relevant for this PR
[x] I have updated the CHANGELOG.md and added information about my change to the "dbt-glue next" section.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
FYI :
I am not sure wether this change needs testing, if it does, I would be interested in understanding how to!
Tests that were run locally
Add a simple model with incremental strategy (test_change_schema) :
{{
config(
materialized='incremental',
incremental_strategy='merge',
unique_key=['id'],
file_format='delta',
on_schema_change='append_new_columns'
)
}}
with incoming_data as (
select 1 as id
union all
select 2 as id
)
select * from incoming_data
Run it a first time and add a new columns :
{{
config(
materialized='incremental',
incremental_strategy='merge',
unique_key=['id'],
file_format='delta',
on_schema_change='append_new_columns'
)
}}
with incoming_data as (
select 1 as id, 'a' as name
union all
select 2 as id, 'b' as name
)
select * from incoming_data
resolves #330
Description
Adds a call to the macro process_schema_change after creating the temp relation and before merging. Similar to dbt-spark's incremental materialization.
It will allow the "ignore", "fail" and "append_new_columns" as on_schema_change strategy but not the "sync_all_columns" because the macro spark__alter_relation_add_remove_columns does not allow for dropping columns. It could because Delta allow for dropping columns if table properties are :
It will be possible once dbt-sprak PR #1088 is merged.
WARNING : there is still a limitation. For example, if using "ignore" strategy and removing columns, update will fail because spark__get_merge_sql uses update set * which looks for all columns.
Checklist
CHANGELOG.md
and added information about my change to the "dbt-glue next" section.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
FYI : I am not sure wether this change needs testing, if it does, I would be interested in understanding how to!
Tests that were run locally
Add a simple model with incremental strategy (test_change_schema) :
Run it a first time and add a new columns :
Column is added.