dbt-labs / dbt-spark

dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks
https://getdbt.com
Apache License 2.0
404 stars 227 forks source link

Allow sync all columns for Delta incremental models #1088

Open Jeremynadal33 opened 3 months ago

Jeremynadal33 commented 3 months ago

resolves #594 docs dbt-labs/docs.getdbt.com/#

Problem

When having an incremental model with Delta file format, we could not use the sync_all_columns as on_schema_change because in the beginning, Delta did not support dropping columns. It now can (see issue 594 for details) but only when having certains table properties.

Solution

As suggested in issue 594, I added a check on current table properties, comparing it to expected table properties for allowing dropping columns. It then raises an error if trying to remove columns with right table properties.

If table properties are correct, it then first add new columns and then remove columns.

DISCLAIMERS :

Any feedback on this one will be much appreciated since this is my first open source PR on such a big project, thanks in advance !

Checklist

sp-cveeragandham commented 2 months ago

Any update on this PR? We have implemented an override macro as a workaround but would be nice if this gets fixed in the adapter code base.

ggng-jaz commented 2 months ago

Hi there, is there an update on this PR? We have the same issue with incremental models in databricks in https://github.com/databricks/dbt-databricks/issues/780 , thanks.

Jeremynadalmirakl commented 2 months ago

I apologize, I don't have time to investigate more currently. If anyone wants to add its stone, feel free to copy the code or tell me and I will give you access to my repo so you can push... Otherwise, I will come back to this as soon as I have enough time 🙏