dbt-labs / dbt-spark

dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks
https://getdbt.com
Apache License 2.0
395 stars 221 forks source link

[ADAP-920] [ADAP-919] [Bug] Delta table metadata changed/concurrent update #892

Open colin-rogers-dbt opened 1 year ago

colin-rogers-dbt commented 1 year ago

Is this a new bug in dbt-spark?

Current Behavior

Seeing intermittent issues with executing delta tables on v1.4: Error from server: error code: \'0\' error message: \'org.apache.hive.service.cli.HiveSQLException: Error running query: io.delta.exceptions.MetadataChangedException: The metadata of the Delta table has been changed by a concurrent update. Please try the operation

Expected Behavior

succeeds

Steps To Reproduce

TBD

Relevant log output

No response

Environment

- OS:
- Python:
- dbt-core:
- dbt-spark:

Additional Context

No response

jeremyyeo commented 1 year ago

For anyone else running into this - we would love to collect additional anecdata.

  1. Modify the default query_comment macro:
-- macros/query_comment.sql

{% macro query_comment(node) %}
    {%- set comment_dict = {} -%}
    {%- do comment_dict.update(
        app='dbt',
        dbt_version=dbt_version,
        profile_name=target.get('profile_name'),
        target_name=target.get('target_name'),
        dbt_invocation_id=invocation_id,
        dbt_cloud_job_id=env_var('DBT_CLOUD_JOB_ID', 'not-a-dbt-cloud-job'),
        dbt_cloud_run_id=env_var('DBT_CLOUD_RUN_ID', 'not-a-dbt-cloud-run')
    ) -%}
    {%- if node is not none -%}
      {%- do comment_dict.update(
        file=node.original_file_path,
        node_id=node.unique_id,
        node_name=node.name,
        resource_type=node.resource_type,
        package_name=node.package_name,
        relation={
            "database": node.database,
            "schema": node.schema,
            "identifier": node.identifier
        }
      ) -%}
    {% else %}
      {%- do comment_dict.update(node_id='internal') -%}
    {%- endif -%}
    {% do return(tojson(comment_dict)) %}
{% endmacro %}
  1. Use it in dbt_project.yml
# dbt_project.yml

name: my_dbt_project
config-version: 2
version: 1.0

models:
  my_dbt_project:
    +materialized: table

query-comment: "{{ query_comment(node) }}"

The query comment should then show up as a SQL comment in the query history and indicate which dbt Cloud run / job or dbt invocation a particular duplicated DDL is tied to.

github-actions[bot] commented 1 month ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.