Closed dataders closed 6 months ago
I have seen this happen with sparksession as well when using the "show" command...
Not sure if there's still interest on this, but looking into the PyHive code it doesn't seem to handle queries with empty result sets correctly. I've forked and issued a PR here but it seems the library's been pretty much unsupported for a few years now
With the changes Jinja is able to compile and results are correctly received
❯ dbt run-operation stage_external_sources --log-level debug --print
01:23:03 Running with dbt=1.6.7
01:23:03 running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'write_json': 'True', 'log_cache_events': 'False', 'partial_parse': 'True', 'cache_selected_only': 'False', 'profiles_dir': '/home/lmarcondes/.dbt', 'fail_fast': 'True', 'warn_error': 'True', 'log_path': '/home/lmarcondes/Documents/projects/votacao-2022/src/capivara-etl-models/capivara/logs', 'debug': 'False', 'version_check': 'True', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'log_format': 'default', 'static_parser': 'True', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'introspect': 'True', 'target_path': 'None', 'invocation_command': 'dbt run-operation stage_external_sources --log-level debug --print', 'send_anonymous_usage_stats': 'False'}
01:23:03 Registered adapter: spark=1.6.0
01:23:03 checksum: a051d2bc88277f3be74306f0393e0e8e6f29724fe11a36c13ebfccd4b87560d8, vars: {}, profile: , target: , version: 1.6.7
01:23:03 Partial parsing enabled: 0 files deleted, 0 files added, 0 files changed.
01:23:03 Partial parsing enabled, no changes found, skipping parsing
01:23:03 Found 1 model, 5 sources, 0 exposures, 0 metrics, 557 macros, 0 groups, 0 semantic models
01:23:03 Acquiring new spark connection 'macro_stage_external_sources'
01:23:03 Spark adapter: NotImplemented: add_begin_query
01:23:03 Spark adapter: NotImplemented: commit
01:23:03 1 of 5 START external source default.caged_for
01:23:03 On "macro_stage_external_sources": cache miss for schema ".default", this is inefficient
01:23:03 Using spark connection "macro_stage_external_sources"
01:23:03 On macro_stage_external_sources: /* {"app": "dbt", "dbt_version": "1.6.7", "profile_name": "capivara", "target_name": "local", "connection_name": "macro_stage_external_sources"} */
show table extended in default like '*'
01:23:03 Opening a new connection, currently in state init
01:23:03 Spark adapter: Poll status: 2, query complete
01:23:03 SQL status: OK in 0.0 seconds
01:23:03 While listing relations in database=, schema=default, found: caged_exc, caged_for, caged_mov, links_2o_turno
01:23:03 1 of 5 (1) refresh table default.caged_for
01:23:03 Using spark connection "macro_stage_external_sources"
01:23:03 On macro_stage_external_sources: /* {"app": "dbt", "dbt_version": "1.6.7", "profile_name": "capivara", "target_name": "local", "connection_name": "macro_stage_external_sources"} */
refresh table default.caged_for
01:23:08 Spark adapter: Poll status: 1, sleeping
01:23:13 Spark adapter: Poll status: 1, sleeping
01:23:18 Spark adapter: Poll status: 1, sleeping
01:23:23 Spark adapter: Poll status: 1, sleeping
01:23:28 Spark adapter: Poll status: 1, sleeping
01:23:33 Spark adapter: Poll status: 1, sleeping
01:23:38 Spark adapter: Poll status: 1, sleeping
01:23:43 Spark adapter: Poll status: 1, sleeping
01:23:48 Spark adapter: Poll status: 1, sleeping
01:23:53 Spark adapter: Poll status: 1, sleeping
01:23:58 Spark adapter: Poll status: 1, sleeping
01:24:03 Spark adapter: Poll status: 1, sleeping
01:24:08 Spark adapter: Poll status: 1, sleeping
01:24:12 Spark adapter: Poll status: 2, query complete
01:24:12 SQL status: OK in 69.0 seconds
01:24:12 1 of 5 (1) OK
01:24:12 2 of 5 START external source default.caged_mov
01:24:12 2 of 5 (1) refresh table default.caged_mov
01:24:12 Using spark connection "macro_stage_external_sources"
01:24:12 On macro_stage_external_sources: /* {"app": "dbt", "dbt_version": "1.6.7", "profile_name": "capivara", "target_name": "local", "connection_name": "macro_stage_external_sources"} */
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.
Is this a regression in a recent version of dbt-spark?
Current Behavior
reports & discussion
@sid-deshmukh originally opened https://github.com/dbt-labs/dbt-external-tables/issues/234, but I believe this issue to be with dbt-spark, not dbt-external-tables.
@timvw and @jelstongreen also reported in a #db-databricks-and-spark thread in Community Slack there were experiencing similar issues
for reference, here's our internal dbt Labs Slack thread
stacktrace
compiling fails with the following stacktrace. dbt calls
.get_result_from_cursor()
which callscursor.fetchall()
which in PyHive is passed to it'sCursor._fetch_more()
(pyhive/hive.py#L507), where it fails.full stacktrace
```py File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/dbt/clients/jinja.py", line 302, in exception_handler yield File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/dbt/clients/jinja.py", line 257, in call_macro return macro(*args, **kwargs) File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/jinja2/runtime.py", line 763, in __call__ return self._invoke(arguments, autoescape) File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/jinja2/runtime.py", line 777, in _invoke rv = self._func(*arguments) File "", line 52, in macro File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/jinja2/sandbox.py", line 393, in call return __context.call(__obj, *args, **kwargs) File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/jinja2/runtime.py", line 298, in call return __obj(*args, **kwargs) File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/dbt/adapters/base/impl.py", line 290, in execute return self.connections.execute(sql=sql, auto_begin=auto_begin, fetch=fetch, limit=limit) File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/dbt/adapters/sql/connections.py", line 149, in execute table = self.get_result_from_cursor(cursor, limit) File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/dbt/adapters/sql/connections.py", line 129, in get_result_from_cursor rows = cursor.fetchall() File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/dbt/adapters/spark/connections.py", line 197, in fetchall return self._cursor.fetchall() File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/pyhive/common.py", line 137, in fetchall return list(iter(self.fetchone, None)) File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/pyhive/common.py", line 106, in fetchone self._fetch_while(lambda: not self._data and self._state != self._STATE_FINISHED) File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/pyhive/common.py", line 46, in _fetch_while self._fetch_more() File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/pyhive/hive.py", line 481, in _fetch_more zip(response.results.columns, schema)] TypeError: 'NoneType' object is not iterable ```Expected/Previous Behavior
things work (ostensibly because pyhive's
cursor.fetch()
does not invoke._fetchmore()
like.fetchmany()
doesSteps To Reproduce
method: thrift
Relevant log output
No response
Environment
Additional Context
this problem ever happening again could be solved by https://github.com/dbt-labs/dbt-core/issues/8471