databricks / dbt-databricks

A dbt adapter for Databricks.
https://databricks.com
Apache License 2.0
199 stars 107 forks source link

How do we merge schemas? #725

Open sugendran opened 2 weeks ago

sugendran commented 2 weeks ago

Describe the bug

When using dbt_project_evaluator I get the error below. I have set spark.databricks.delta.schema.autoMerge.enabled to true, but it's not evolving the schema.

Ultimately I'd love to know what setting I can add to make this merge the schema

07:55:15    Runtime Error in model base_source_columns (models/staging/graph/base/base_source_columns.sql)
  [_LEGACY_ERROR_TEMP_DELTA_0007] A schema mismatch detected when writing to the Delta table (Table ID: da497304-2797-46fe-9185-037ee80ec708).
  To enable schema migration using DataFrameWriter or DataStreamWriter, please set:
  '.option("mergeSchema", "true")'.
  For other operations, set the session configuration
  spark.databricks.delta.schema.autoMerge.enabled to "true". See the documentation
  specific to the operation for details.

  Table schema:
  root
  -- node_unique_id: string (nullable = true)
  -- name: string (nullable = true)
  -- description: string (nullable = true)
  -- data_type: string (nullable = true)
  -- quote: string (nullable = true)

  Data schema:
  root
  -- node_unique_id: string (nullable = true)
  -- name: string (nullable = true)
  -- description: string (nullable = true)
  -- data_type: string (nullable = true)
  -- quote: string (nullable = true)
  -- col6: boolean (nullable = true)
  -- col7: integer (nullable = true)
  -- col8: string (nullable = true)

  Table ACLs are enabled in this cluster, so automatic schema migration is not allowed. Please use the ALTER TABLE command for changing the schema. 

Steps To Reproduce

I added dbt_project_evaluator and it started failing.

models:
  dbt_project_evaluator:
    +schema: dbt_project_evaluator
    +materialized: table
    +databricks_tags:
      project: dbt
    +tblproperties:
      mergeSchema: true

# https://dbt-labs.github.io/dbt-project-evaluator/0.8/#additional-setup-for-databrickssparkduckdbredshift
dispatch:
  - macro_namespace: dbt
    search_order: ['dbt_project_evaluator', 'dbt']

System information

The output of dbt --version:

➜  main git:(develop) dbt --version
Core:
  - installed: 1.8.3
  - latest:    1.8.3 - Up to date!

Plugins:
  - databricks: 1.8.3 - Up to date!
  - redshift:   1.8.1 - Up to date!
  - postgres:   1.8.2 - Up to date!
  - duckdb:     1.8.1 - Up to date!
  - spark:      1.8.0 - Up to date!

The operating system you're using: OSX 14.3.1 (Sonoma)

The output of python --version: python --version

benc-db commented 2 weeks ago
 Table ACLs are enabled in this cluster, so automatic schema migration is not allowed. Please use the ALTER TABLE command for changing the schema.

I am not familiar with the dbt_project_evaluator, but the above statement is the key. We do not generally use schema evolution today, as there are many edge cases that do not work as expected. The current patterns are: