dbt-labs / dbt-spark

dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks
https://getdbt.com
Apache License 2.0
408 stars 229 forks source link

[Bug] Cannot run unit tests against Spark/Hudi, receiving "NoneType is not iterable" error #1047

Open KLarrabee-Arcadia opened 6 months ago

KLarrabee-Arcadia commented 6 months ago

Is this a new bug in dbt-core?

Current Behavior

We are attempting to run DBT unit tests against a local dockerized Spark/Hudi container, and while dbt run is perfectly successful, we cannot for the life of us get dbt test to work properly despite following docs and searching through issues. Anything we do (assuming the test spec is valid - otherwise we appropriately get hints like "need given", etc) just results in 'NoneType' object is not iterable errors:

17:34:38    Runtime Error in unit_test test_characters (models/unit_tests/test_characters.yml)
  An error occurred during execution of unit test 'test_characters'. There may be an error in the unit test definition: check the data types.
   Compilation Error
    'NoneType' object is not iterable

One of my coworkers was successfully running the unit tests against DuckDB, so we assumed the test specs were fine.

Expected Behavior

The test should pass/fail rather than error out.

Steps To Reproduce

I created a very simple repository where (1) dbt run is successful and (2) dbt test results in the error above (see repo README for steps to reproduce success and failure).

Relevant log output

After locally adding import traceback; print(traceback.format_exc()) right before this line I observed the following stack trace:

16:28:21  Traceback (most recent call last):
  File "/Users/kevinlarrabee/projects/pydeps/dbt-core/core/dbt/task/base.py", line 368, in safe_run
    result = self.compile_and_execute(manifest, ctx)
  File "/Users/kevinlarrabee/projects/pydeps/dbt-core/core/dbt/task/base.py", line 314, in compile_and_execute
    result = self.run(ctx.node, manifest)
  File "/Users/kevinlarrabee/projects/pydeps/dbt-core/core/dbt/task/base.py", line 415, in run
    return self.execute(compiled_node, manifest)
  File "/Users/kevinlarrabee/projects/pydeps/dbt-core/core/dbt/task/test.py", line 265, in execute
    unit_test_node, unit_test_result = self.execute_unit_test(test, manifest)
  File "/Users/kevinlarrabee/projects/pydeps/dbt-core/core/dbt/task/test.py", line 225, in execute_unit_test
    macro_func()
  File "/Users/kevinlarrabee/projects/pydeps/dbt-core/core/dbt/clients/jinja.py", line 84, in __call__
    return self.call_macro(*args, **kwargs)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/dbt_common/clients/jinja.py", line 298, in call_macro
    return macro(*args, **kwargs)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/jinja2/runtime.py", line 768, in __call__
    return self._invoke(arguments, autoescape)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/jinja2/runtime.py", line 782, in _invoke
    rv = self._func(*arguments)
  File "<template>", line 61, in macro
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/jinja2/sandbox.py", line 394, in call
    return __context.call(__obj, *args, **kwargs)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/jinja2/runtime.py", line 303, in call
    return __obj(*args, **kwargs)
  File "/Users/kevinlarrabee/projects/pydeps/dbt-core/core/dbt/clients/jinja.py", line 84, in __call__
    return self.call_macro(*args, **kwargs)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/dbt_common/clients/jinja.py", line 298, in call_macro
    return macro(*args, **kwargs)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/jinja2/runtime.py", line 768, in __call__
    return self._invoke(arguments, autoescape)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/jinja2/runtime.py", line 782, in _invoke
    rv = self._func(*arguments)
  File "<template>", line 33, in macro
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/jinja2/sandbox.py", line 394, in call
    return __context.call(__obj, *args, **kwargs)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/jinja2/runtime.py", line 303, in call
    return __obj(*args, **kwargs)
  File "/Users/kevinlarrabee/projects/pydeps/dbt-core/core/dbt/clients/jinja.py", line 84, in __call__
    return self.call_macro(*args, **kwargs)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/dbt_common/clients/jinja.py", line 298, in call_macro
    return macro(*args, **kwargs)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/jinja2/runtime.py", line 768, in __call__
    return self._invoke(arguments, autoescape)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/jinja2/runtime.py", line 782, in _invoke
    rv = self._func(*arguments)
  File "<template>", line 52, in macro
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/jinja2/sandbox.py", line 394, in call
    return __context.call(__obj, *args, **kwargs)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/jinja2/runtime.py", line 303, in call
    return __obj(*args, **kwargs)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/dbt/adapters/base/impl.py", line 350, in execute
    return self.connections.execute(sql=sql, auto_begin=auto_begin, fetch=fetch, limit=limit)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/dbt/adapters/sql/connections.py", line 159, in execute
    table = self.get_result_from_cursor(cursor, limit)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/dbt/adapters/sql/connections.py", line 141, in get_result_from_cursor
    rows = cursor.fetchall()
  File "/Users/kevinlarrabee/projects/pydeps/dbt-spark/dbt/adapters/spark/connections.py", line 251, in fetchall
    return self._cursor.fetchall()
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/pyhive/common.py", line 142, in fetchall
    return list(iter(self.fetchone, None))
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/pyhive/common.py", line 111, in fetchone
    self._fetch_while(lambda: not self._data and self._state != self._STATE_FINISHED)
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/pyhive/common.py", line 51, in _fetch_while
    self._fetch_more()
  File "/Users/kevinlarrabee/Library/Caches/pypoetry/virtualenvs/<app-name>-PMbYIgg4-py3.9/lib/python3.9/site-packages/pyhive/hive.py", line 507, in _fetch_more
    zip(response.results.columns, schema)]
TypeError: 'NoneType' object is not iterable

Is there some configuration that I missed for unit tests that is making this fail via PyHive, Thrift, etc?

When I look at the compiled SQL for the unit tests in the targets/ folder, it does indeed create valid SQL that, when I run manually in beeline, returns expected results.

Environment

- OS: macOS 14.5
- Python: 3.10
- dbt: 1.8

Which database adapter are you using with dbt?

spark

Additional Context

This is specific to unit tests, since data tests run perfectly fine.

dbeatty10 commented 6 months ago

Thanks for raising this issue @KLarrabee-Arcadia.

To help narrow down the issue, could you try out this "hello world" example to see if it works for you?

models/hello_world.sql

select 'world' as hello

models/_properties.yml

unit_tests:
  - name: test_hello_world
    model: hello_world
    given: []
    expect:
      rows:
        - {hello: world}

Run this command to execute the unit tests and then build the model if they pass:

dbt build --select hello_world
KLarrabee-Arcadia commented 6 months ago

Thanks for responding @dbeatty10! I just added it and see the same error:

❯ docker compose run dbt dbt build --select hello_world

WARN[0000] /Users/kevinlarrabee/projects/dbt-unit-test-example/docker-compose.yml: `version` is obsolete
18:33:27  Running with dbt=1.8.0
18:33:27  Registered adapter: spark=1.8.0
18:33:27  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.int_customers_per_store
18:33:27  Found 3 models, 8 data tests, 453 macros, 3 unit tests
18:33:27
18:33:32  Concurrency: 1 threads (target='dev')
18:33:32
18:33:32  1 of 2 START unit_test hello_world::test_hello_world ........................... [RUN]
18:33:37  1 of 2 ERROR hello_world::test_hello_world ..................................... [ERROR in 5.11s]
18:33:37  2 of 2 SKIP relation hudi_dbt.hello_world ...................................... [SKIP]
18:33:43
18:33:43  Finished running 1 unit test, 1 view model in 0 hours 0 minutes and 15.35 seconds (15.35s).
18:33:43
18:33:43  Completed with 1 error and 0 warnings:
18:33:43
18:33:43    Runtime Error in unit_test test_hello_world (models/_properties.yml)
  An error occurred during execution of unit test 'test_hello_world'. There may be an error in the unit test definition: check the data types.
   Compilation Error
    'NoneType' object is not iterable

    > in macro run_query (macros/etc/statement.sql)
    > called by macro materialization_unit_default (macros/materializations/tests/unit.sql)
    > called by <Unknown>
18:33:43
18:33:43  Done. PASS=0 WARN=0 ERROR=1 SKIP=1 TOTAL=2
KLarrabee-Arcadia commented 6 months ago

@dbeatty10 I also set up a different branch in that example repo that has two different Spark backends, one configured for Hudi and one with a default configuration, as well as a Postgre backend.

Running dbt test shows the same NoneType error as reported against either of the Spark backends, but (as a sanity check) it does work against the Postgres backend:

$ docker compose run --build dbt dbt test --target postgres

...
17:38:35  8 of 11 PASS unique_professors_name ............................................ [PASS in 5.05s]
17:38:35  9 of 11 START unit_test characters::test_characters ............................ [RUN]
17:38:35  9 of 11 FAIL 1 characters::test_characters ..................................... [FAIL 1 in 0.07s]
17:38:35  10 of 11 START unit_test hello_world::test_hello_world ......................... [RUN]
17:38:35  10 of 11 PASS hello_world::test_hello_world .................................... [PASS in 0.02s]
17:38:35  11 of 11 START unit_test professors::test_professors ........................... [RUN]
17:38:35  11 of 11 ERROR professors::test_professors ..................................... [ERROR in 0.02s]
17:38:35
17:38:35  Finished running 8 data tests, 3 unit tests in 0 hours 0 minutes and 10.44 seconds (10.44s).
17:38:35
17:38:35  Completed with 2 errors and 0 warnings:
17:38:35
17:38:35  Failure in unit_test test_characters (models/unit_tests/test_characters.yml)
17:38:35

actual differs from expected:

@@ ,id,name
→  ,1 ,kevin→Philip J. Fry
+++,2 ,Turanga Leela
+++,3 ,Bender Bending Rodríguez
+++,4 ,Prof. Hubert J. Farnsworth
+++,5 ,Professor Ogden Wernstrom