dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.92k stars 1.63k forks source link

[Bug] Error running unit tests that use the `dbt_utils.star` macro #10254

Open megetron3 opened 5 months ago

megetron3 commented 5 months ago

Is this a new bug in dbt-core?

Current Behavior

When running unit tests that involve the star macro from dbt-utils, I encounter an error indicating that the macro cannot be used with ephemeral models. However, the models in question are all materialized as views, not ephemeral models.

Test fails with error:

Compilation Error in unit_test test_positions (models/poc/test_positions.yml)  
  The `star` macro cannot be used with ephemeral models, as it relies on the information schema.  
  `KLARNA` is an ephemeral model. Consider making it a view or table instead.          

  > in macro _is_ephemeral (macros/jinja_helpers/_is_ephemeral.sql)
  > called by macro default__star (macros/sql/star.sql)
  > called by macro star (macros/sql/star.sql)
  > called by unit_test test_positions (models/poc/test_positions.yml)

Expected Behavior

The test should run successfully without any errors, as the model is materialized as a view.

Steps To Reproduce

  1. Create a model that uses the star macro from dbt-utils.
  2. Materialize the model as a view.
  3. Write a unit test for the model
  4. Run dbt test --select .

Relevant code:

unit_tests:
  - name: test_positions
    description: "Should enrich positions"
    model: pos__enriched_pc__klarna
    given:
      - input: source('POS__CANONIC_PC', 'KLARNA')
        rows:
          - {
            "ID": 1,
            "SOURCE_PATH": "example_source_path",
          }
    expect:
      rows:
          - {
            "ID": 1,
            "SOURCE_PATH": "example_source_path"
          }

KLARNA.sql

{{ config(schema='POS__CANONIC_PC') }}
select
  null::integer as ID,
  null::text as SOURCE_PATH
where false

pos__enriched_pc__klarna.sql

{{
    config(
        alias='KLARNA',
        materialized="view"
    )
}}

WITH relevant_at_records AS (
    SELECT * FROM {{ source('POS__CANONIC_PC','KLARNA') }} p
)

SELECT {{ dbt_utils.star(source('POS__CANONIC_PC', 'KLARNA'), 'p') }}
FROM relevant_at_records p

sources.yml

sources:
  - name: POS__CANONIC_PC
    schema: main_POS__CANONIC_PC
    tables:
      - name: KLARNA

Relevant log output

17:51:37.821071 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1030a6650>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x104986680>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x104986260>]}

============================== 17:51:37.823932 | d44c1e2a-c921-4960-ae79-3532a9943558 ==============================
17:51:37.823932 [info ] [MainThread]: Running with dbt=1.8.1
17:51:37.824236 [debug] [MainThread]: running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'log_cache_events': 'False', 'write_json': 'True', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'debug': 'False', 'fail_fast': 'False', 'log_path': '/Users/guy.eliyahu/repo/luma_unit_tests/logs', 'version_check': 'True', 'profiles_dir': '/Users/guy.eliyahu/.dbt', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'invocation_command': 'dbt test --select test_positions', 'introspect': 'True', 'static_parser': 'True', 'target_path': 'None', 'log_format': 'default', 'send_anonymous_usage_stats': 'True'}
17:51:37.988223 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'project_id', 'label': 'd44c1e2a-c921-4960-ae79-3532a9943558', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x104adbc40>]}
17:51:38.012620 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': 'd44c1e2a-c921-4960-ae79-3532a9943558', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x104a33ee0>]}
17:51:38.013373 [info ] [MainThread]: Registered adapter: duckdb=1.8.0
17:51:38.028159 [debug] [MainThread]: checksum: ebc509fb14e151ea268e89b60fd5abf912ed99129f87516462718fa2f14f4838, vars: {}, profile: , target: , version: 1.8.1
17:51:38.137805 [debug] [MainThread]: Partial parsing enabled: 0 files deleted, 0 files added, 1 files changed.
17:51:38.138187 [debug] [MainThread]: Partial parsing: updated file: unit_tests://models/sources.yml
17:51:38.194200 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'load_project', 'label': 'd44c1e2a-c921-4960-ae79-3532a9943558', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x104aae140>]}
17:51:38.267129 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': 'd44c1e2a-c921-4960-ae79-3532a9943558', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x104ed0c40>]}
17:51:38.267412 [info ] [MainThread]: Found 6 models, 1 source, 525 macros, 2 unit tests
17:51:38.267603 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'd44c1e2a-c921-4960-ae79-3532a9943558', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x104ef15a0>]}
17:51:38.268313 [info ] [MainThread]: 
17:51:38.268593 [debug] [MainThread]: Acquiring new duckdb connection 'master'
17:51:38.271115 [debug] [ThreadPool]: Acquiring new duckdb connection 'list_dbt_main'
17:51:38.345485 [debug] [ThreadPool]: Using duckdb connection "list_dbt_main"
17:51:38.345699 [debug] [ThreadPool]: On list_dbt_main: BEGIN
17:51:38.345840 [debug] [ThreadPool]: Opening a new connection, currently in state init
17:51:38.351464 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
17:51:38.351629 [debug] [ThreadPool]: Using duckdb connection "list_dbt_main"
17:51:38.351775 [debug] [ThreadPool]: On list_dbt_main: /* {"app": "dbt", "dbt_version": "1.8.1", "profile_name": "unit_tests", "target_name": "dev", "connection_name": "list_dbt_main"} */
select
      'dbt' as database,
      table_name as name,
      table_schema as schema,
      CASE table_type
        WHEN 'BASE TABLE' THEN 'table'
        WHEN 'VIEW' THEN 'view'
        WHEN 'LOCAL TEMPORARY' THEN 'table'
        END as type
    from system.information_schema.tables
    where table_schema = 'main'
    and table_catalog = 'dbt'

17:51:38.368430 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
17:51:38.369176 [debug] [ThreadPool]: On list_dbt_main: ROLLBACK
17:51:38.369873 [debug] [ThreadPool]: Failed to rollback 'list_dbt_main'
17:51:38.370009 [debug] [ThreadPool]: On list_dbt_main: Close
17:51:38.371237 [debug] [ThreadPool]: Re-using an available connection from the pool (formerly list_dbt_main, now list_dbt_main_POS__CANONIC_PC)
17:51:38.372695 [debug] [ThreadPool]: Using duckdb connection "list_dbt_main_POS__CANONIC_PC"
17:51:38.372850 [debug] [ThreadPool]: On list_dbt_main_POS__CANONIC_PC: BEGIN
17:51:38.372974 [debug] [ThreadPool]: Opening a new connection, currently in state closed
17:51:38.375621 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
17:51:38.375768 [debug] [ThreadPool]: Using duckdb connection "list_dbt_main_POS__CANONIC_PC"
17:51:38.375901 [debug] [ThreadPool]: On list_dbt_main_POS__CANONIC_PC: /* {"app": "dbt", "dbt_version": "1.8.1", "profile_name": "unit_tests", "target_name": "dev", "connection_name": "list_dbt_main_POS__CANONIC_PC"} */
select
      'dbt' as database,
      table_name as name,
      table_schema as schema,
      CASE table_type
        WHEN 'BASE TABLE' THEN 'table'
        WHEN 'VIEW' THEN 'view'
        WHEN 'LOCAL TEMPORARY' THEN 'table'
        END as type
    from system.information_schema.tables
    where table_schema = 'main_POS__CANONIC_PC'
    and table_catalog = 'dbt'

17:51:38.387015 [debug] [ThreadPool]: SQL status: OK in 0.0 seconds
17:51:38.387607 [debug] [ThreadPool]: On list_dbt_main_POS__CANONIC_PC: ROLLBACK
17:51:38.387787 [debug] [ThreadPool]: Failed to rollback 'list_dbt_main_POS__CANONIC_PC'
17:51:38.387905 [debug] [ThreadPool]: On list_dbt_main_POS__CANONIC_PC: Close
17:51:38.388942 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'd44c1e2a-c921-4960-ae79-3532a9943558', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x104a82500>]}
17:51:38.389171 [debug] [MainThread]: Using duckdb connection "master"
17:51:38.389292 [debug] [MainThread]: On master: BEGIN
17:51:38.389408 [debug] [MainThread]: Opening a new connection, currently in state init
17:51:38.392039 [debug] [MainThread]: SQL status: OK in 0.0 seconds
17:51:38.392175 [debug] [MainThread]: On master: COMMIT
17:51:38.392292 [debug] [MainThread]: Using duckdb connection "master"
17:51:38.392412 [debug] [MainThread]: On master: COMMIT
17:51:38.392564 [debug] [MainThread]: SQL status: OK in 0.0 seconds
17:51:38.392689 [debug] [MainThread]: On master: Close
17:51:38.393291 [info ] [MainThread]: Concurrency: 1 threads (target='dev')
17:51:38.393515 [info ] [MainThread]: 
17:51:38.394795 [debug] [Thread-1 (]: Began running node unit_test.unit_tests.pos__enriched_pc__klarna.test_positions
17:51:38.395006 [info ] [Thread-1 (]: 1 of 1 START unit_test pos__enriched_pc__klarna::test_positions ................ [RUN]
17:51:38.395239 [debug] [Thread-1 (]: Re-using an available connection from the pool (formerly list_dbt_main_POS__CANONIC_PC, now unit_test.unit_tests.pos__enriched_pc__klarna.test_positions)
17:51:38.395393 [debug] [Thread-1 (]: Began compiling node unit_test.unit_tests.pos__enriched_pc__klarna.test_positions
17:51:38.395542 [debug] [Thread-1 (]: Began executing node unit_test.unit_tests.pos__enriched_pc__klarna.test_positions
17:51:38.446075 [debug] [Thread-1 (]: Compilation Error in unit_test test_positions (models/poc/test_positions.yml)

  The `star` macro cannot be used with ephemeral models, as it relies on the information schema.

  `KLARNA` is an ephemeral model. Consider making it a view or table instead.

  > in macro _is_ephemeral (macros/jinja_helpers/_is_ephemeral.sql)
  > called by macro default__star (macros/sql/star.sql)
  > called by macro star (macros/sql/star.sql)
  > called by unit_test test_positions (models/poc/test_positions.yml)
17:51:38.446418 [error] [Thread-1 (]: 1 of 1 ERROR pos__enriched_pc__klarna::test_positions .......................... [ERROR in 0.05s]
17:51:38.446718 [debug] [Thread-1 (]: Finished running node unit_test.unit_tests.pos__enriched_pc__klarna.test_positions
17:51:38.447576 [debug] [MainThread]: Using duckdb connection "master"
17:51:38.447741 [debug] [MainThread]: On master: BEGIN
17:51:38.447871 [debug] [MainThread]: Opening a new connection, currently in state closed
17:51:38.450818 [debug] [MainThread]: SQL status: OK in 0.0 seconds
17:51:38.450971 [debug] [MainThread]: On master: COMMIT
17:51:38.451095 [debug] [MainThread]: Using duckdb connection "master"
17:51:38.451220 [debug] [MainThread]: On master: COMMIT
17:51:38.451379 [debug] [MainThread]: SQL status: OK in 0.0 seconds
17:51:38.451501 [debug] [MainThread]: On master: Close
17:51:38.452173 [debug] [MainThread]: Connection 'master' was properly closed.
17:51:38.452371 [debug] [MainThread]: Connection 'unit_test.unit_tests.pos__enriched_pc__klarna.test_positions' was properly closed.
17:51:38.452514 [info ] [MainThread]: 
17:51:38.452660 [info ] [MainThread]: Finished running 1 unit test in 0 hours 0 minutes and 0.18 seconds (0.18s).
17:51:38.452939 [debug] [MainThread]: Command end result
17:51:38.470776 [info ] [MainThread]: 
17:51:38.471006 [info ] [MainThread]: Completed with 1 error and 0 warnings:
17:51:38.471147 [info ] [MainThread]: 
17:51:38.471323 [error] [MainThread]:   Compilation Error in unit_test test_positions (models/poc/test_positions.yml)

  The `star` macro cannot be used with ephemeral models, as it relies on the information schema.

  `KLARNA` is an ephemeral model. Consider making it a view or table instead.

  > in macro _is_ephemeral (macros/jinja_helpers/_is_ephemeral.sql)
  > called by macro default__star (macros/sql/star.sql)
  > called by macro star (macros/sql/star.sql)
  > called by unit_test test_positions (models/poc/test_positions.yml)
17:51:38.471464 [info ] [MainThread]: 
17:51:38.471604 [info ] [MainThread]: Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
17:51:38.473212 [debug] [MainThread]: Resource report: {"command_name": "test", "command_wall_clock_time": 0.6966405, "process_user_time": 0.970085, "process_kernel_time": 0.23907, "process_mem_max_rss": "137936896", "command_success": false, "process_in_blocks": "0", "process_out_blocks": "0"}
17:51:38.473459 [debug] [MainThread]: Command `dbt test` failed at 17:51:38.473416 after 0.70 seconds
17:51:38.473646 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1030a6650>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x104a33c70>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x104e61e10>]}
17:51:38.473814 [debug] [MainThread]: Flushing usage events

Environment

- OS: macos
- Python: 3.10.0
- dbt: 1.8.1

Which database adapter are you using with dbt?

No response

Additional Context

No response

megetron3 commented 5 months ago

just notified on slack that i must mock macros of the unittest:

adding this solve the issue:

    overrides:
      macros:
        dbt_utils.star: 'ID, SOURCE_PATH, PARTNER'
megetron commented 5 months ago

@dbeatty10, could you kindly reopen this?

I've come to realize that testing a model shouldn't necessarily compel you to mock the macro. Rather, it should be an optional step.

kylienhu commented 1 month ago

@dbeatty10 Could you reopen this issue? I'm running into the exact same error, but as I used the dbt_utils.star macro more than once to select columns from different models, I couldn't overwrite the macro results like OP did.

dbeatty10 commented 1 month ago

See below for a reproducible example that involves at least two calls to dbt_utils.star that have different outputs. The example is a bit contrived, but I think it demonstrates the issue being raised.

Reprex

Create these files:

models/my_model_a

select 1 as id, 1 as id_a

models/my_model_b

select 1 as id, 2 as id_b

models/my_model_c

select
    a.id,
    {{ dbt_utils.star(ref('my_model_a'), except=["id"]) }},
    {{ dbt_utils.star(ref('my_model_b'), except=["id"]) }}

from {{ ref('my_model_a') }} as a
inner join {{ ref('my_model_b') }} as b on a.id = b.id

models/_unit_tests.yml

unit_tests:
  - name: dbt_core_10254
    model: my_model_c
    given:
      - input: ref('my_model_a')
        rows:
          - { id: 0, id_b: 123 }
      - input: ref('my_model_b')
        rows:
          - { id: 0, id_c: 456 }
    expect:
      rows:
          - { id: 0, id_b: 123, id_c: 456 }

Run these commands:

dbt run -s +my_model_c
dbt build -s my_model_c

Get this output:

22:47:33  Running with dbt=1.8.6
22:47:33  Registered adapter: postgres=1.8.2
22:47:33  Found 3 models, 533 macros, 1 unit test
22:47:33  
22:47:34  Concurrency: 5 threads (target='postgres')
22:47:34  
22:47:34  1 of 2 START unit_test my_model_c::dbt_core_10254 .............................. [RUN]
22:47:34  1 of 2 ERROR my_model_c::dbt_core_10254 ........................................ [ERROR in 0.05s]
22:47:34  2 of 2 SKIP relation dbt_dbeatty.my_model_c .................................... [SKIP]
22:47:34  
22:47:34  Finished running 1 unit test, 1 view model in 0 hours 0 minutes and 0.36 seconds (0.36s).
22:47:34  
22:47:34  Completed with 1 error and 0 warnings:
22:47:34  
22:47:34    Compilation Error in unit_test dbt_core_10254 (models/_unit_tests.yml)

  The `star` macro cannot be used with ephemeral models, as it relies on the information schema.

  `my_model_a` is an ephemeral model. Consider making it a view or table instead.

  > in macro _is_ephemeral (macros/jinja_helpers/_is_ephemeral.sql)
  > called by macro default__star (macros/sql/star.sql)
  > called by macro star (macros/sql/star.sql)
  > called by unit_test dbt_core_10254 (models/_unit_tests.yml)
22:47:34  
22:47:34  Done. PASS=0 WARN=0 ERROR=1 SKIP=1 TOTAL=2