elementary-data / elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
https://www.elementary-data.com/
Apache License 2.0
1.87k stars 158 forks source link

databricks - error creating elementary.dbt_columns/models/sources tables #1635

Open jakub-auger opened 1 month ago

jakub-auger commented 1 month ago

Describe the bug Error thrown for dbt_columns, dbt_models and _dbt_sources table creation during first dbt run after elementary is added to the dbt project

03:53:00 Completed with 3 errors and 0 warnings: 03:53:00 03:53:00 Runtime Error in model dbt_columns (models\edr\dbt_artifacts\dbt_columns.sql) 03:53:00 [RequestId=4c2efc34-3ea5-4d1b-9afa-155f5ecae9be ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url 'abfss://datalakehouse@.dfs.core.windows.net/elementary/dbt_columns' overlaps with other external tables or volumes within 'CreateTable' call. Conflicting tables/volumes: datalakehouse.elementary.dbt_columns. 03:53:00 03:53:00 Runtime Error in model dbt_models (models\edr\dbt_artifacts\dbt_models.sql) 03:53:00 [RequestId=33c14b44-302b-48c9-a765-da35ae379a12 ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url 'abfss://datalakehouse@.dfs.core.windows.net/elementary/dbt_models' overlaps with other external tables or volumes within 'CreateTable' call. Conflicting tables/volumes: datalakehouse.elementary.dbt_models. 03:53:00 03:53:00 Runtime Error in model dbt_sources (models\edr\dbt_artifacts\dbt_sources.sql) 03:53:00 [RequestId=f4d77330-7c89-403c-8f58-54069dd7c217 ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url 'abfss://datalakehouse@****.dfs.core.windows.net/elementary/dbt_sources' overlaps with other external tables or volumes within 'CreateTable' call. Conflicting tables/volumes: datalakehouse.elementary.dbt_sources.

To Reproduce Steps to reproduce the behavior:

  1. dbt run --select elementary

Expected behavior no errors

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Plugins:

Additional context This is a clean install I'm using external tables

tried updating dbt-core and databricks, but same error

(dbt-dev) C:\git\aic_datalakehouse>dbt -v using legacy validation callback Core:

Plugins:

NoyaArie commented 1 month ago

It looks like it failed to create these tables.

[RequestId=f4d77330-7c89-403c-8f58-54069dd7c217 ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url 'abfss://datalakehouse@****.dfs.core.windows.net/elementary/dbt_sources' overlaps with other external tables or volumes within 'CreateTable' call. Conflicting tables/volumes: datalakehouse.elementary.dbt_sources.

From what I see, this is a Databricks error regarding privileges: https://docs.databricks.com/en/sql/language-manual/sql-ref-external-locations.html

jakub-auger commented 1 month ago

@NoyaArie thanks for looking into it

that's strange as I am using a single account/token (mine - admin) to run it. None of the other models in the project have problems

in fact those 3 tables (along with the rest - 24 all up?) are created

the error is referring to dbt/elementary trying to create tables that overlap the same physical location in my blob/datalake storage location

ok found it in the logs

looks like elementary is trying to create the temp/staging table in the same external location as the final table triggering the error. Allowing it would cause the final table to be overwritten with the staging data.

Are there any known workarounds?

`13:38:50.641894 [debug] [Thread-2 (]: On model.elementary.dbt_columns: / {"app": "dbt", "dbt_version": "1.8.3", "dbt_databricks_version": "1.8.3", "databricks_sql_connector_version": "3.1.2", "profile_name": "aic_datalakehouse", "target_name": "prod", "node_id": "model.elementary.dbt_columns"} /

    create or replace table `datalakehouse`.`elementary`.`dbt_columns__tmp_20240719040850594589`

  using delta

location 'abfss://datalakehouse@xxxxxx.dfs.core.windows.net/elementary/dbt_columns'

  as

    SELECT

        *

    FROM `datalakehouse`.`elementary`.`dbt_columns`
    WHERE 1 = 0

`