dbt-labs / dbt-spark

dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks
https://getdbt.com
Apache License 2.0
387 stars 220 forks source link

[ADAP-802] [Bug] Unable to read Iceberg tables when using session connection #869

Open joleyjol opened 1 year ago

joleyjol commented 1 year ago

Is this a new bug in dbt-spark?

Current Behavior

A dbt-spark project using the session connection method is unable to read Iceberg tables from Glue catalog due to a pyspark.sql.utils.AnalysisException with desc = SHOW TABLE EXTENDED is not supported for v2 tables .

I did some digging and I think the issue is related to the exception_handler in connections.py

In particular, this block:

        except Exception as exc:
            logger.debug("Error while running:\n{}".format(sql))
            logger.debug(exc)
            if len(exc.args) == 0:
                raise

I've verified that my job is hitting the len(exc.args) == 0 condition, probably because I'm using the session connection method, but I haven't verified that.

I was able to work around this error in my local environment by raising a DbtRuntimeError with the desc from the orginal exception, instead of just re-raising the original exception itself.

Is there any reason this method should ever re-raise the original error instead of a DbtRuntimeError?

Expected Behavior

The pyspark.sql.utils.AnalysisException should have been wrapped in a DbtRuntimeError, and thus handled by the existing logic that checks for this specific error message to deal with Iceberg table metadata properly.

Steps To Reproduce

1) Run dbt-spark in a project configured with the session connection method 2) Run a model that reads an Iceberg table from Glue 3) Observe that the run fails due to a pyspark.sql.utils.AnalysisException

Relevant log output

20:04:36.994919 [error] [MainThread]: Encountered an error:
SHOW TABLE EXTENDED is not supported for v2 tables.;
ShowTableExtended *, [namespace#21, tableName#22, isTemporary#23, information#24]
+- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@50b0402d, [dbt_iceberg_db]
20:04:37.002637 [error] [MainThread]: Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/dbt/cli/requires.py", line 87, in wrapper
    result, success = func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/cli/requires.py", line 72, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/cli/requires.py", line 143, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/cli/requires.py", line 172, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/cli/requires.py", line 219, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/cli/requires.py", line 259, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/cli/main.py", line 278, in docs_generate
    results = task.run()
  File "/opt/conda/lib/python3.10/site-packages/dbt/task/generate.py", line 206, in run
    compile_results = CompileTask.run(self)
  File "/opt/conda/lib/python3.10/site-packages/dbt/task/runnable.py", line 468, in run
    result = self.execute_with_hooks(selected_uids)
  File "/opt/conda/lib/python3.10/site-packages/dbt/task/runnable.py", line 428, in execute_with_hooks
    self.before_run(adapter, selected_uids)
  File "/opt/conda/lib/python3.10/site-packages/dbt/task/runnable.py", line 415, in before_run
    self.populate_adapter_cache(adapter)
  File "/opt/conda/lib/python3.10/site-packages/dbt/task/runnable.py", line 406, in populate_adapter_cache
    adapter.set_relations_cache(self.manifest)
  File "/opt/conda/lib/python3.10/site-packages/dbt/adapters/base/impl.py", line 473, in set_relations_cache
    self._relations_cache_for_schemas(manifest, required_schemas)
  File "/opt/conda/lib/python3.10/site-packages/dbt/adapters/base/impl.py", line 450, in _relations_cache_for_schemas
    for relation in future.result():
  File "/opt/conda/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/opt/conda/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/opt/conda/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/utils.py", line 465, in connected
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/adapters/spark/impl.py", line 213, in list_relations_without_caching
    show_table_extended_rows = self.execute_macro(LIST_RELATIONS_MACRO_NAME, kwargs=kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/adapters/base/impl.py", line 1054, in execute_macro
    result = macro_function(**kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/clients/jinja.py", line 330, in __call__
    return self.call_macro(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/clients/jinja.py", line 257, in call_macro
    return macro(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/jinja2/runtime.py", line 763, in __call__
    return self._invoke(arguments, autoescape)
  File "/opt/conda/lib/python3.10/site-packages/jinja2/runtime.py", line 777, in _invoke
    rv = self._func(*arguments)
  File "<template>", line 21, in macro
  File "/opt/conda/lib/python3.10/site-packages/jinja2/sandbox.py", line 393, in call
    return __context.call(__obj, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/jinja2/runtime.py", line 298, in call
    return __obj(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/clients/jinja.py", line 330, in __call__
    return self.call_macro(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/clients/jinja.py", line 257, in call_macro
    return macro(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/jinja2/runtime.py", line 763, in __call__
    return self._invoke(arguments, autoescape)
  File "/opt/conda/lib/python3.10/site-packages/jinja2/runtime.py", line 777, in _invoke
    rv = self._func(*arguments)
  File "<template>", line 33, in macro
  File "/opt/conda/lib/python3.10/site-packages/jinja2/sandbox.py", line 393, in call
    return __context.call(__obj, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/jinja2/runtime.py", line 298, in call
    return __obj(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/clients/jinja.py", line 330, in __call__
    return self.call_macro(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/clients/jinja.py", line 257, in call_macro
    return macro(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/jinja2/runtime.py", line 763, in __call__
    return self._invoke(arguments, autoescape)
  File "/opt/conda/lib/python3.10/site-packages/jinja2/runtime.py", line 777, in _invoke
    rv = self._func(*arguments)
  File "<template>", line 52, in macro
  File "/opt/conda/lib/python3.10/site-packages/jinja2/sandbox.py", line 393, in call
    return __context.call(__obj, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/jinja2/runtime.py", line 298, in call
    return __obj(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/dbt/adapters/base/impl.py", line 290, in execute
    return self.connections.execute(sql=sql, auto_begin=auto_begin, fetch=fetch, limit=limit)
  File "/opt/conda/lib/python3.10/site-packages/dbt/adapters/sql/connections.py", line 146, in execute
    _, cursor = self.add_query(sql, auto_begin)
  File "/opt/conda/lib/python3.10/site-packages/dbt/adapters/sql/connections.py", line 80, in add_query
    cursor.execute(sql, bindings)
  File "/opt/conda/lib/python3.10/site-packages/dbt/adapters/spark/session.py", line 208, in execute
    self._cursor.execute(sql)
  File "/opt/conda/lib/python3.10/site-packages/dbt/adapters/spark/session.py", line 110, in execute
    self._df = spark_session.sql(sql)
  File "/opt/conda/lib/python3.10/site-packages/pyspark/sql/session.py", line 1034, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery), self)
  File "/opt/conda/lib/python3.10/site-packages/py4j/java_gateway.py", line 1321, in __call__
    return_value = get_return_value(
  File "/opt/conda/lib/python3.10/site-packages/pyspark/sql/utils.py", line 196, in deco
    raise converted from None
pyspark.sql.utils.AnalysisException: SHOW TABLE EXTENDED is not supported for v2 tables.;
ShowTableExtended *, [namespace#21, tableName#22, isTemporary#23, information#24]
+- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@50b0402d, [dbt_iceberg_db]

Environment

- OS: Ubuntu 22.04.1 LTS
- Python: 3.10.8
- dbt-core: 1.6.0
- dbt-spark: 1.6.0

Additional Context

No response

ben-schreiber commented 11 months ago

@joleyjol this looks similar to #837 , does the fix there solve this issue as well?

joleyjol commented 10 months ago

It looks like this should resolve my issue as well, thanks

tanweipeng commented 7 months ago

@joleyjol , so the issue that you raised is to wrap exception into DbtRuntimeError but not on the SHOW TABLE EXTENDED is not supported for v2 tables, right?

github-actions[bot] commented 1 month ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.