dbt-labs / dbt-spark

dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks
https://getdbt.com
Apache License 2.0
399 stars 227 forks source link

[Bug] `spark__list_relations_without_caching` expects legacy `schema` field #1048

Open JCZuurmond opened 5 months ago

JCZuurmond commented 5 months ago

Is this a new bug in dbt-spark?

Current Behavior

spark__list_relations_without_caching expects legacy fieldrelation.schema

{% macro spark__list_relations_without_caching(relation) %}
  {% call statement('list_relations_without_caching', fetch_result=True) -%}
    show table extended in {{ relation.schema }} like '*'
  {% endcall %}

  {% do return(load_result('list_relations_without_caching').table) %}
{% endmacro %}

Expected Behavior

spark__list_relations_without_caching expects relation

{% macro spark__list_relations_without_caching(relation) %}
  {% call statement('list_relations_without_caching', fetch_result=True) -%}
    show table extended in {{ relation }} like '*'
  {% endcall %}

  {% do return(load_result('list_relations_without_caching').table) %}
{% endmacro %}

Steps To Reproduce

N.A.

Relevant log output

No response

Environment

Irrelevant

Additional Context

See Spark SQL migration guide

jtcohen6 commented 4 months ago

Hey @JCZuurmond, good to hear from you!

Here's my understanding of the situation:

I think the right next step is to support catalog and namespace as official aliases for database and schema, respectively.

Is that something you'd be interested in contributing?

stegus64 commented 1 month ago

This issue is the root cause of this problem: https://github.com/dbt-labs/spark-utils/issues/38

This code does not work any more:

https://github.com/dbt-labs/spark-utils/blob/f792c519e68b64e3411508bfa5f41a02e8646372/macros/maintenance_operation.sql#L4

{% for database in sparklist_schemas('not_used') %} {% for table in sparklist_relations_without_caching(database[0]) %}

The value returned by spark__list_schemas() is the result of SHOW DATABASES which only contains one single column named "databaseName"

This means that relation.schema in spark__list_relations_without_caching returns an empty string which means that

show table extended in {{ relation.schema }} like '*'

causes a syntax error in SQL.

I am not sure why .schema was added in this commit #972. For my purpose just changing "relation.schema" to "relation" fixes the issue.

I do not know what other problems such a change might cause.

It seems that #972 is a breaking change.