databricks / dbt-databricks

A dbt adapter for Databricks.
https://databricks.com
Apache License 2.0
226 stars 119 forks source link

Snapshot + uniform is broken since todays databricks release #819

Closed jelmerk closed 1 month ago

jelmerk commented 1 month ago

Describe the bug

We have a number of dbt snapshot models and use this configuratrion that configures the output tables to be used with uniform

snapshots:
  foo_source_product:
    foo:

      +target_schema: foo_source_product_snapshots

      +post-hook:
        - "vacuum {{ this }}"

      +tblproperties:
        delta.columnMapping.mode: name
        delta.minReaderVersion: 3
        delta.minWriterVersion: 7
        delta.universalFormat.enabledFormats: iceberg
        delta.enableIcebergCompatV2: true
        delta.feature.timestampNtz: supported

After todays Databricks release in the US all of our snapshot builds started failing with the following error

Table properties delta.universalFormat.enabledFormats and delta.universalFormat.v1Formats.location are only allowed for managed Delta tables.

This is because as part of the snapshot process dbt-databricks creates a view

And you end up with something like

create
or replace view `data_platform_prd`.`foo_source_product_snapshots`.`my_table__dbt_tmp`
tblproperties (
  'delta.columnMapping.mode' = 'name',
  'delta.minReaderVersion' = '3',
  'delta.minWriterVersion' = '7',
  'delta.universalFormat.enabledFormats' = 'iceberg',
  'delta.enableIcebergCompatV2' = 'True',
  'delta.feature.timestampNtz' = 'supported'
) as select 1 as test;

Which databricks now rejects because the view is not a managed delta table. Before today's release this worked fine

System information

The output of dbt --version:

dbt --version
Core:
  - installed: 1.8.7
  - latest:    1.8.7 - Up to date!

Plugins:
  - databricks: 1.8.6 - Up to date!
  - spark:      1.8.0 - Up to date!

The operating system you're using: Mac os / linux inside of docker

The output of python --version: Python 3.11.4

jelmerk commented 1 month ago

I created a fix here

As a workaround you can place a databricks_build_snapshot_staging_table.sql file with the following content in macros/default_override/databricks_build_snapshot_staging_table.sql

{% macro databricks_build_snapshot_staging_table(strategy, sql, target_relation) %}
    {% set tmp_identifier = target_relation.identifier ~ '__dbt_tmp' %}

    {%- set tmp_relation = api.Relation.create(identifier=tmp_identifier,
                                               schema=target_relation.schema,
                                               database=target_relation.database,
                                               type='view') -%}

    {% set select = snapshot_staging_table(strategy, sql, target_relation) %}

    {# needs to be a non-temp view so that its columns can be ascertained via `describe` #}
    {% call statement('build_snapshot_staging_relation') %}

        create or replace view {{ tmp_relation }}
        as
            {{ select }}

    {% endcall %}

    {% do return(tmp_relation) %}
{% endmacro %}
benc-db commented 1 month ago

By today's release, do you mean dbt-databricks 1.8.6, or a change in the databricks runtime? I've looked at your PR, and I agree, setting metadata on tmp views is silly.

jelmerk commented 1 month ago

By today's release, do you mean dbt-databricks 1.8.6, or a change in the databricks runtime?

I think Neither.

It's part of how the sql warehouses function. Which as I understand it are not bound to a dbr release and are updated by databricks at will with no option to roll back to a previous release

We have 2 workspaces , 1 in the EU, 1 in the US. This morning it stopped working for us in the US. but as of now it still works in our EU workspace.

Our databricks representative told us that Databricks roll out changes region by region. So I guess this will hit the EU soon unless they roll this back