databricks / dbt-databricks

A dbt adapter for Databricks.
https://databricks.com
Apache License 2.0
214 stars 115 forks source link

Streaming table partitions aren't picked up on subsequent runs #716

Closed kdazzle closed 2 months ago

kdazzle commented 3 months ago

Describe the bug

I have a streaming table with partitions (code below). The initial run works fine and partitions the data as expected. However, following runs will either delete + recreate the table (a couple weeks ago), or will error out saying the partitions can't be changed (as of around today).

Not sure where the issue lies...I suspect it could be in this repo, but haven't dug into the code. I have an email thread going with dbt support, as well.

Thank you!

Steps To Reproduce

$ dbt run --select=my_streaming_table
$ dbt run --select=my_streaming_table

Expected behavior

Add new records to the streaming table

Screenshots and log output

Error message (from Databricks DLT pipeline, not dbt)

org.apache.spark.sql.catalyst.ExtendedAnalysisException: Cannot change partition columns for table __materialization_mat_my_streaming_table_1.
Current: environment, date
Requested:

System information

Additional context

Sample model:

# my_streaming_table.py

{{
  config(
    persist_docs={"relation": false, "columns": false},
    materialized = 'streaming_table',
    partition_by='environment,date',
    tblproperties = {
        'pipelines.autoOptimize.zOrderCols': 'id,status',
    }
  )
}}

select
  id,
  timestamp,
  data,
  status,
  environment,
  date
from stream {{ ref('my_upstream_source') }}
where {% if target.name not in ['prod'] %}
 date > DATE_SUB(CURRENT_DATE(), 1) and
{% endif %}
benc-db commented 3 months ago

Can you email me a dbt.log of this happening? ben.cassell@databricks.com

benc-db commented 3 months ago

partition_by doesn't look formatted correctly. It should be:

partition_by=['environment', 'date']

Does making this change work?

kdazzle commented 2 months ago

Hey @benc-db - thanks for the response. I just got around to trying that again. Looks like it works fine on 1.8.3 with the list format. I'm 99.9% sure we tried that a few weeks back (on 1.8.0?), since that's what I had in an example I was pointing people to. But who knows.

Anyways, I'll close this - thanks again!