databricks / dbt-databricks

A dbt adapter for Databricks.
https://databricks.com
Apache License 2.0
195 stars 104 forks source link

Runtime Error [WRONG_COMMAND_FOR_OBJECT_TYPE] after bumping from 1.8.1 to 1.8.2 #712

Closed RobertasPetrauskas1 closed 5 days ago

RobertasPetrauskas1 commented 6 days ago

Describe the bug

After databricks workflows bumped the adapter version to 1.8.2 our gold layer dimensional models started failing in both prod and in staging environments. Some context for clarity:

When running dbt build --target staging we expect the output in the gold layer to be a view, but dbt issues drop table if exists ... commands resulting in Runtime Error [WRONG_COMMAND_FOR_OBJECT_TYPE] The operation DROP TABLE requires a EXTERNAL or MANAGED. But ... is a VIEW. Use DROP VIEW instead. SQLSTATE: 42809

When running dbt build --target prod we expect the output in the gold layer to be a table, but dbt issues drop view if exists ... commands resulting in Runtime Error Error running query: org.apache.spark.sql.AnalysisException: Cannot drop a table with DROP VIEW. Please use DROP TABLE instead

Using databricks query history, I don't seem to find prior queries like drop view if exists or drop table if exists when running these models, so this change must have happened in 1.8.2. Not really sure why this is happening only on some gold models (3 out of 5 dimension tables fail, as well as downstream fact tables that are skipped). One thing I noticed is that if I delete all of the gold tables in the gold schema in UC, the first dbt build run is successful, but consecutive runs fail with the same errors.

Steps To Reproduce

Currently I tried to reproduce this problem only on our own code locally by installing dbt-databricks 1.8.2 and running dbt build and got the same errors, but after downgrading to 1.8.1 the models run successfully. Don't really have time to create a test environment to reproduce this myself, but maybe will have time later if no obvious fix is found immediately.

Expected behavior

dbt build to work as it did before.

Screenshots and log output

Not sure if I just dump the whole file here but might just add this later as a separate comment if needed.

System information

The output of dbt --version:

Core:
  - installed: 1.8.3
  - latest:    1.8.3 - Up to date!

Plugins:
  - databricks: 1.8.2 - Up to date!
  - spark:      1.8.0 - Up to date!

The operating system you're using: macOS Sonoma 14.4.1

The output of python --version: Python 3.10.12

Additional context

Macros for UC structure based on env:

get_custom_alias.sql:

-- based on https://medium.com/data-manypets/how-to-customise-dbts-model-naming-for-easy-developing-on-production-1238559a939a
{% macro generate_alias_name(custom_alias_name=none, node=none) -%}

    {%- if custom_alias_name -%}

        {%- if 'dev' in target.name -%}

            {{node.unrendered_config.schema | trim}}_{{ custom_alias_name | trim }}

        {%- else -%}

            {{ custom_alias_name | trim }}

        {%- endif -%}

    {%- elif node.version -%}

        {{ return(node.name ~ "_v" ~ (node.version | replace(".", "_"))) }}

    {%- else -%}

        {%- if 'dev' in target.name -%}

            {{node.unrendered_config.schema | trim}}_{{ node.name }}

        {%- else -%}

            {{ node.name }}

        {%- endif -%}

    {%- endif -%}

{%- endmacro %}

get_custom_schema.sql:

{% macro generate_schema_name(custom_schema_name, node) -%}

    {%- set default_schema = target.schema -%}
    {%- if ('prod' in target.name  or 'staging' in target.name or 'test' in target.name) and custom_schema_name is not none -%}
        {{ custom_schema_name | trim }}
    {%- else -%}
        {{ default_schema }}
    {%- endif -%}

{%- endmacro %}

dbt_project.yml:

name: 'my_project'
version: '1.0.0'
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: 'my_project'

# These configurations specify where dbt should look for different types of files.
# For Databricks asset bundles, we put everything in transformations, as you may have
# non-dbt resources in your project.
model-paths: ["transformations/models"]
analysis-paths: ["transformations/analyses"]
test-paths: ["transformations/tests"]
seed-paths: ["transformations/seeds"]
macro-paths: ["transformations/macros"]
snapshot-paths: ["transformations/snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"

# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views by default. These settings can be overridden in the
# individual model files using the `{{ config(...) }}` macro.

models:
  dbt_artifacts:
    +schema: dbt_artifacts
    staging:
      +schema: dbt_artifacts
    sources:
      +schema: dbt_artifacts

  my_project:
    silver:
      +schema: silver

    gold:
      +schema: gold
      materialized: "{{ 'table' if 'prod' in target.name else 'view' }}"

on-run-start:
    - '{{create_udfs()}}'

on-run-end:
  - "{{ dbt_artifacts.upload_results(results) }}"
benc-db commented 5 days ago

@RobertasPetrauskas1 can you email me a dbt.log: ben.cassell@databricks.com

I suspect that somewhere in migrating to using the system catalog for metadata that we're missing a catalog name.

RobertasPetrauskas1 commented 5 days ago

Done. Let me know if you need anything else

benc-db commented 5 days ago

Got it, will release a fix shortly.