databricks / dbt-databricks

A dbt adapter for Databricks.
https://databricks.com
Apache License 2.0
225 stars 119 forks source link

Cannot materialize `view` models shared via Delta Sharing #781

Open shivonchain opened 2 months ago

shivonchain commented 2 months ago

Describe the bug

When trying to materialize a view in dbt via dbt run or dbt build, and the view is shared via Delta Sharing, the materialization fails:

Runtime Error in model my_view_model (models/my_view_model.sql)
  [RequestId=xxxxx ErrorClass=INVALID_STATE] 
TABLE datacloud_prod_catalog.due_diligence_silver_views.my_view_model cannot be deleted 
because it is being shared via Delta Sharing. It is shared through the following shares: my_share. 
If you just want to update a shared view, please use ALTER VIEW instead.

To update a view definition when the underlying view is shared over Delta Sharing, ALTER VIEW should be used instead of CREATE OR REPLACE, which is issued by dbt for view models. Ideally, this is baked into the default view materialization in dbt-databricks.

An alternative I have considered is to use pre- and post-hooks to remove and add the view back to the underlying share. However, this is not always feasible.

Steps To Reproduce

In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.

  1. Create a view model in dbt.
  2. Share the view model with a data recipient via Delta Sharing.
  3. Run the view model using dbt run -s <model>.

Expected behavior

I expect the dbt view to be materialized even if it is shared by Delta Sharing. The view materialization should check if the view exists; if it does, then ALTER VIEW should be issued to Databricks instead of CREATE OR REPLACE to update the view definition.

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

System information

The output of dbt --version:

Core:
  - installed: 1.8.2
  - latest:    1.8.6 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - databricks: 1.8.1 - Update available!
  - spark:      1.8.0 - Up to date!

  At least one plugin is out of date or incompatible with dbt-core.
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

The operating system you're using: macOS, M3 Max

The output of python --version: Python 3.12.3

Additional context

Add any other context about the problem here.

dataders commented 2 months ago

@benc-db for when you're back from vacation

benc-db commented 2 months ago

This is an interesting edge case that will require some significant testing. The safest change would be to only use alter in place of 'create or replace' if the table is being delta-shared. Is there a tblproperty that indicates this?

shivonchain commented 2 months ago

@benc-db I don't believe so, the closest piece of metadata I found was by calling the REST API and grabbing the storage_location of the view.

databricks tables get $TABLE_FQN | jq .storage_location

For Delta-shared tables/views, the storage_location URI will begin with uc-deltasharing://.

I'd like to see if I can grab storage_location with SQL instead, that way it could be incorporated into the materialization logic like you suggest

henlue commented 1 month ago

My team experiences the same issue. We would be glad about a solution as well.

@shivonchain If I'm not mistaken then the storage_location can only be used from the recipients side to check if a view was received via delta sharing. To resolve the issue we would need to know from the sharing side if a view is part of a share.

The only way I found to check if a view is being delta-shared is using a requests to the shares api: databricks shares list and databricks shares get <share_name> --include-shared-data. The result would be a list of all catalogs, schemas and relations that are being delta-shared. It would be possible to check whether the current view is part of this.

@benc-db would it maybe be better to try to execute the create or replace statement and, if it fails with the error above, retry it using an alter statement?