databricks / dbt-databricks

A dbt adapter for Databricks.
https://databricks.com
Apache License 2.0
226 stars 119 forks source link

Implement microbatch incremental strategy #825

Closed benc-db closed 1 month ago

benc-db commented 1 month ago

Resolves #824

Description

Implements the microbatch incremental strategy: https://docs.getdbt.com/docs/build/incremental-microbatch

Core idea is that dbt will determine slices of time to break up an insert into multiple statements; we run a replace-where with those slices so that any old data is replaced by the newest version of that data. This makes it much easier for users to back fill, and on failure, only rerun the slices that failed.

I have to cast the column to TIMESTAMP, as if your event_time column is a date, Databricks casts the conditions to date and then it looks like replace where date >= X and date < X

I also hit an issue with column comments that I think was introduced in dbt-core 1.9.0b2 that I have fixed here.

Checklist