dbt-labs / dbt-external-tables

dbt macros to stage external sources
https://hub.getdbt.com/dbt-labs/dbt_external_tables/latest/
Apache License 2.0
294 stars 119 forks source link

Standart flow don't work in Databricks with unity catalog #190

Open bochkarevnv opened 1 year ago

bochkarevnv commented 1 year ago

Describe the bug

When I use plugin in Databricks with unity catalog for any start external tables drops and create.

Steps to reproduce

Create spec like this:

version: 2

sources:
  - name: my_external_table
    catalog: foo
    schema: bar
    tables:
      - name: external_table
        external:
          location: '...'
          using: parquet

Run it first time with dbt run-operation stage_external_sources Run it one more time

Expected results

At first time external table created, at second nothing to do.

Actual results

At first time external table created, at second drops and creates

Screenshots and log output

Log (only main lines)

/* {"app": "dbt", "dbt_version": "1.4.1", "dbt_databricks_version": "1.4.1", "databricks_sql_connector_version": "2.3.0", "profile_name": "main_one", "target_name": "prod", "connection_name": "macro_stage_external_sources"} */
show tables in `bar`
Databricks adapter: <class 'databricks.sql.exc.ServerOperationError'>: [SCHEMA_NOT_FOUND] The schema `bar` cannot be found
Databricks adapter: Error while running:
macro show_tables

with database=, schema=bar, relations=[]
1 of 1 (1) drop table if exists `foo`.`bar`.`external_table`

1 of 10 (2) create table `foo`.`bar`.`external_table`

System information

The contents of your packages.yml file:

Which database are you using dbt with?

The output of dbt --version:

Core:                                            
  - installed: 1.4.1                             
  - latest:    1.4.4 - ←[33mUpdate available!←[0m

  Your version of dbt-core is out of date!       
  You can find instructions for upgrading here:  
  https://docs.getdbt.com/docs/installation      

Plugins:
  - databricks: 1.4.1 - ←[33mUpdate available!←[0m
  - spark:      1.4.1 - ←[32mUp to date!←[0m

  At least one plugin is out of date or incompatible with dbt-core.
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

The operating system you're using: Windows/Linux

The output of python --version: Python 3.11.1

Additional context

I think problem is near https://github.com/dbt-labs/dbt-external-tables/blob/main/macros/plugins/spark/get_external_build_plan.sql#L6

grindheim commented 1 year ago

@bochkarevnv We also experienced this behaviour. The cause is that the spark plugin implementation of the file get_external_build_plan.sql uses none for database. But with unity catalog, database exists, so should be provided with the value source_node.database.

This seems to be the only change required to make it work as expected. One could either implement a full new plugin for Databricks here, but that might be a bit overkill given the rest of the files are exactly the same. Alternatively, one could perhaps add a databricks__get_external_build_plan macro in the same get_external_build_plan.sql file for the spark plugin.

In the short term, you can override the macro locally by creating a new macro sql file and add the below code:

{% macro databricks__get_external_build_plan(source_node) %}

    {% set build_plan = [] %}

    {% set old_relation = adapter.get_relation(
        database = source_node.database,
        schema = source_node.schema,
        identifier = source_node.identifier
    ) %}

    {% set create_or_replace = (old_relation is none or var('ext_full_refresh', false)) %}

    {% if create_or_replace %}
        {% set build_plan = build_plan + [
            dbt_external_tables.dropif(source_node), 
            dbt_external_tables.create_external_table(source_node)
        ] %}
    {% else %}
        {% set build_plan = build_plan + dbt_external_tables.refresh_external_table(source_node) %}
    {% endif %}

    {% set recover_partitions = dbt_external_tables.recover_partitions(source_node) %}
    {% if recover_partitions %}
    {% set build_plan = build_plan + [
        recover_partitions
    ] %}
    {% endif %}

    {% do return(build_plan) %}

{% endmacro %}
github-actions[bot] commented 11 months ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

bochkarevnv commented 11 months ago

Still actual

github-actions[bot] commented 5 months ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

bochkarevnv commented 5 months ago

Still actual

grindheim commented 4 months ago

This should be fixed by https://github.com/dbt-labs/dbt-external-tables/pull/236