Closed ajsquared closed 3 weeks ago
This is also using the insert_overwrite
incremental strategy
A couple of things of interest on this issue:
If possible, can you try 1.8.2rc1 and see if it fixes the issue? If not, please share the relevant log with me via email: ben.cassell@databricks.com
Okay, I'll try out 1.8.2rc1 and let you know
I see the same behavior on 1.8.2rc1. I'll share log files after I've tested #695 too
I've sent the logs over now
Ok, here is what is going on. This is a consequence of spark.sql.sources.partitionOverwriteMode, which surprisingly affects the create or replace, in addition to the insert overwrite. So, what I can do in the adapter is to set the property to static on full-refresh and set to dynamic on incremental, and I think this should work for most reasonable use cases.
Thanks for reporting this bug; I think your mitigation is easy enough, just blow away the existing table outside of dbt, but this report will improve the adapter.
Ah I see, that makes sense. I think that approach sounds good; it will be great if --full-refresh
just works™!
Thanks for the quick fix on this! OOC, do you have an estimate of when 1.8.2 would be released?
Most likely week after next as we have Data and AI Summit next week. I can push a second release candidate if it would help to have this available as prerelease on pypi before then.
After the summit is totally fine, just wanted to plan on my end when to push out the next upgrade
Describe the bug
When converting a non-incremental table to an incremental table, a
--full-refresh
leaves old data in the table.Steps To Reproduce
We had a DBT model that was initially non-incremental. We decided to add partitioning and turn it into an incremental model to improve performance. After making this change, we ran the model with the
--full-refresh
flag.However, we saw the old table data still present:
count(*)
on the table returned twice as many rows as expectedThis might be the same underlying issue as #695, but I'm not sure so I've filed this separately.
Expected behavior
Based on https://docs.getdbt.com/docs/build/incremental-models#how-do-i-rebuild-an-incremental-model, I'd expect DBT to drop the existing table, so any data currently in the table would be removed.
System information
The output of
dbt --version
:The operating system you're using: Linux
The output of
python --version
:Python 3.11.7