dbt-labs / dbt-external-tables

dbt macros to stage external sources
https://hub.getdbt.com/dbt-labs/dbt_external_tables/latest/
Apache License 2.0
313 stars 123 forks source link

Compilation error when calling stage_external_sources from a prehook in a model config #324

Closed hugoerolex closed 2 hours ago

hugoerolex commented 1 day ago

Describe the bug

When calling the stage_external_sources macro from a prehook in a model config, a compilation error is raised when running a dbt build command. Note that executing the macro through the dbt run-operation command works fine.

Steps to reproduce

Create a model and write the following minimal configuration :

{{ config(
    materialized="view",
    pre_hook="{{ dbt_external_tables.stage_external_sources() }}"
) }}

Expected results

External tables defined in my source.yml are created if not exist / refreshed if exist

Actual results

Compilation error

Screenshots and log output

Server logs summary :

10:05:21 Running dbt...
10:05:22 1 of 26 START external source schema.external_table
10:05:22 1 of 26 (1) create schema if not exists                             database.sch...  
10:05:22 Encountered an error:
Compilation Error in model XXX (path/to/model/XXX.sql)
  'None' has no attribute 'keys'

Server logs details :

11:08:40 Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7d197fdbd450>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7d197fdf4c90>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7d197fb61010>]}
11:08:40 Running dbt...
11:08:40 running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'log_cache_events': 'False', 'write_json': 'True', 'partial_parse': 'True', 'cache_selected_only': 'False', 'profiles_dir': '/usr/src/dbt-server-shared/.dbt', 'debug': 'False', 'version_check': 'True', 'log_path': '/usr/src/dbt-server-shared/working_dir/28f6fca5-dba7-4bbf-ba06-034e56bea7ee', 'fail_fast': 'False', 'warn_error': 'None', 'use_colors': 'True', 'use_experimental_parser': 'False', 'empty': 'None', 'quiet': 'False', 'no_print': 'None', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'static_parser': 'True', 'introspect': 'True', 'invocation_command': 'dbt -A dbt_worker.app worker --loglevel=DEBUG --concurrency=2 --max-memory-per-child=500000', 'target_path': 'None', 'log_format': 'json', 'send_anonymous_usage_stats': 'True'}
11:08:40 Discovered Exposures:
[]
11:08:40 Sending event: {'category': 'dbt', 'action': 'project_id', 'label': '97aa710e-3d9b-447b-9163-cc643e72e51d', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7d1977557d10>]}
11:08:40 Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': '97aa710e-3d9b-447b-9163-cc643e72e51d', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7d197f9d7a10>]}
11:08:40 Registered adapter: snowflake=1.9.0-post10+a16654bed4d67d1a922bf87e579a285c35bd55ac
11:08:40 checksum: 007e558dd5cbf66720904e41fb9ea366a52729d1df24383e6687de4655d0a2a1, vars: {}, profile: user, target: , version: 2024.11.20+a99d1be
11:08:40 Observability Metric: msgpack_manifest_bytes=2442424.0
11:08:40 Partial parsing enabled: 0 files deleted, 0 files added, 1 files changed.
11:08:40 Partial parsing: updated file: project://models/path/to/model/XXX.sql
11:08:40 1 of 26 START external source schema.external_table
11:08:40 1 of 26 (1) create schema if not exists                             database.sch...  
11:08:40 Encountered an error:
Compilation Error in model XXX (path/to/model/XXX.sql)
  'None' has no attribute 'keys'
11:08:40 Resource report: {"command_name": "parse", "command_wall_clock_time": 0.28523365, "process_user_time": 26.76897, "process_kernel_time": 4.135137, "process_mem_max_rss": "332524", "process_out_blocks": "338448", "command_success": false, "process_in_blocks": "0"}
11:08:40 Observability Metric: command_success=0.0
11:08:40 Observability Metric: command_wall_clock_time=0.28523364663124084
11:08:40 Observability Metric: process_user_time=26.768970489501953
11:08:40 Observability Metric: process_kernel_time=4.13513708114624
11:08:40 Observability Metric: process_mem_max_rss=332524.0
11:08:40 Command `cli parse` failed at 11:08:40.876994 after 0.29 seconds
11:08:40 Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7d19812e8d10>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7d197f9aa4d0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7d197fa8c350>]}

System information

The contents of your packages.yml file:

packages:
  - package: dbt-labs/codegen
    version: 0.12.1
  - package: dbt-labs/dbt_utils
    version: 1.1.1  
  - package: dbt-labs/dbt_project_evaluator
    version: 0.12.1
  - package: calogica/dbt_expectations
    version: [">=0.10.0", "<0.11.0"]
  - package: dbt-labs/dbt_external_tables
    version: 0.10.0

Which database are you using dbt with?

The output of dbt --version: dbt cloud versionless

The operating system you're using: dbt cloud running in Chrome on Windows

jeremyyeo commented 10 hours ago

Hey Hugo,

Just doing a quick sense check...

# packages.yml
packages:
  - package: dbt-labs/dbt_external_tables
    version: 0.10.0

# dbt_project.yml
name: my_dbt_project
profile: all
config-version: 2
version: 1.0

models:
  my_dbt_project:
    +materialized: table

# models/sources.yml
version: 2
sources:
  - name: raw
    database: db
    schema: dbt_jyeo
    loader: S3
    tables:
      - name: ext_people
        external:
          location: "@dbt_test.dbt_external_tables_integration_tests_snowflake.dbt_external_tables_testing/csv"
          file_format: "(type = csv skip_header = 1)"
        columns: &people-cols
          - name: id
            data_type: int
          - name: first_name
            data_type: varchar(64)
          - name: last_name
            data_type: varchar(64)
          - name: email
            data_type: varchar(64)
-- models/new_tbl.sql
{{
    config(
        pre_hook = "{{ dbt_external_tables.stage_external_sources() }}"
    )
}}

select 1 id

First things first, delete the external table if it exist so we're starting from scratch:

-- run in snowflake
drop table db.dbt_jyeo.ext_people if exists;
select * from db.dbt_jyeo.ext_people;
-- Object 'DB.DBT_JYEO.EXT_PEOPLE' does not exist or not authorized.

Now, we build:

$ dbt --debug run
01:45:09  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1069d08d0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x106dcaa10>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x106dcb450>]}
01:45:09  Running with dbt=1.9.0-b4
01:45:09  running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'log_cache_events': 'False', 'write_json': 'True', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'fail_fast': 'False', 'debug': 'True', 'log_path': '/Users/jeremy/git/dbt-basic/logs', 'profiles_dir': '/Users/jeremy/.dbt', 'version_check': 'True', 'use_colors': 'True', 'use_experimental_parser': 'False', 'empty': 'False', 'quiet': 'False', 'no_print': 'None', 'log_format': 'default', 'static_parser': 'True', 'introspect': 'True', 'invocation_command': 'dbt --debug run', 'target_path': 'None', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'send_anonymous_usage_stats': 'True'}
01:45:10  Sending event: {'category': 'dbt', 'action': 'project_id', 'label': 'a5d38a67-5d4d-402d-9940-4efdfeb38682', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x127497290>]}
01:45:10  Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': 'a5d38a67-5d4d-402d-9940-4efdfeb38682', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10657b350>]}
01:45:10  Registered adapter: snowflake=1.9.0-b1
01:45:10  checksum: 0d3acddb5afbce98f9888b13248063cae3d09e6e329263e3ffe4b2a6aa072c8d, vars: {}, profile: , target: , version: 1.9.0b4
01:45:10  Partial parsing enabled: 0 files deleted, 0 files added, 1 files changed.
01:45:10  Partial parsing: updated file: my_dbt_project://models/sources.yml
01:45:10  Sending event: {'category': 'dbt', 'action': 'load_project', 'label': 'a5d38a67-5d4d-402d-9940-4efdfeb38682', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x15ab5c9d0>]}
01:45:10  Wrote artifact WritableManifest to /Users/jeremy/git/dbt-basic/target/manifest.json
01:45:10  Wrote artifact SemanticManifest to /Users/jeremy/git/dbt-basic/target/semantic_manifest.json
01:45:10  Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': 'a5d38a67-5d4d-402d-9940-4efdfeb38682', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x15ae8ce90>]}
01:45:10  Found 1 model, 1 source, 518 macros
01:45:10  Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'a5d38a67-5d4d-402d-9940-4efdfeb38682', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x15ab5cc10>]}
01:45:10  
01:45:10  Concurrency: 1 threads (target='sf')
01:45:10  
01:45:10  Acquiring new snowflake connection 'master'
01:45:10  Acquiring new snowflake connection 'list_db'
01:45:10  Using snowflake connection "list_db"
01:45:10  On list_db: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "all", "target_name": "sf", "connection_name": "list_db"} */
show terse schemas in database db
    limit 10000
01:45:10  Opening a new connection, currently in state init
01:45:13  SQL status: SUCCESS 2 in 2.376 seconds
01:45:13  Re-using an available connection from the pool (formerly list_db, now list_db_dbt_jyeo)
01:45:13  Using snowflake connection "list_db_dbt_jyeo"
01:45:13  On list_db_dbt_jyeo: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "all", "target_name": "sf", "connection_name": "list_db_dbt_jyeo"} */
show objects in db.dbt_jyeo limit 10000;
01:45:13  SQL status: SUCCESS 2 in 0.396 seconds
01:45:13  Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'a5d38a67-5d4d-402d-9940-4efdfeb38682', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x15ae54410>]}
01:45:13  Began running node model.my_dbt_project.new_tbl
01:45:13  1 of 1 START sql table model dbt_jyeo.new_tbl .................................. [RUN]
01:45:13  Re-using an available connection from the pool (formerly list_db_dbt_jyeo, now model.my_dbt_project.new_tbl)
01:45:13  Began compiling node model.my_dbt_project.new_tbl
01:45:13  Writing injected SQL for node "model.my_dbt_project.new_tbl"
01:45:13  Began executing node model.my_dbt_project.new_tbl
01:45:13  1 of 1 START external source dbt_jyeo.ext_people
01:45:13  Using snowflake connection "model.my_dbt_project.new_tbl"
01:45:13  On model.my_dbt_project.new_tbl: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.new_tbl"} */
show terse schemas like 'dbt_jyeo' in database db limit 1;
01:45:13  SQL status: SUCCESS 1 in 0.350 seconds
01:45:13  1 of 1 (1) select 'Schema dbt_jyeo exists' from dual;
01:45:13  Using snowflake connection "model.my_dbt_project.new_tbl"
01:45:13  On model.my_dbt_project.new_tbl: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.new_tbl"} */
select 'Schema dbt_jyeo exists' from dual;
01:45:14  SQL status: SUCCESS 1 in 0.328 seconds
01:45:14  1 of 1 (1) SUCCESS 1
01:45:14  1 of 1 (2) create or replace external table db.dbt_jyeo.ext_people(                id int a...  
01:45:14  Using snowflake connection "model.my_dbt_project.new_tbl"
01:45:14  On model.my_dbt_project.new_tbl: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.new_tbl"} */
create or replace external table db.dbt_jyeo.ext_people(
                id int as ((case when is_null_value(value:c1) or lower(value:c1) = 'null' then null else value:c1 end)::int),
                first_name varchar(64) as ((case when is_null_value(value:c2) or lower(value:c2) = 'null' then null else value:c2 end)::varchar(64)),
                last_name varchar(64) as ((case when is_null_value(value:c3) or lower(value:c3) = 'null' then null else value:c3 end)::varchar(64)),
                email varchar(64) as ((case when is_null_value(value:c4) or lower(value:c4) = 'null' then null else value:c4 end)::varchar(64))
        )
    location = @dbt_test.dbt_external_tables_integration_tests_snowflake.dbt_external_tables_testing/csv 

    file_format = (type = csv skip_header = 1)

;
01:45:15  SQL status: SUCCESS 1 in 1.620 seconds
01:45:15  1 of 1 (2) SUCCESS 1
01:45:15  Writing runtime sql for node "model.my_dbt_project.new_tbl"
01:45:15  Using snowflake connection "model.my_dbt_project.new_tbl"
01:45:15  On model.my_dbt_project.new_tbl: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.new_tbl"} */
create or replace transient table db.dbt_jyeo.new_tbl
         as
        (

select 1 id
        );
01:45:17  SQL status: SUCCESS 1 in 1.295 seconds
01:45:17  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': 'a5d38a67-5d4d-402d-9940-4efdfeb38682', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x15b4543d0>]}
01:45:17  1 of 1 OK created sql table model dbt_jyeo.new_tbl ............................. [SUCCESS 1 in 3.74s]
01:45:17  Finished running node model.my_dbt_project.new_tbl
01:45:17  Connection 'master' was properly closed.
01:45:17  Connection 'model.my_dbt_project.new_tbl' was left open.
01:45:17  On model.my_dbt_project.new_tbl: Close
01:45:17  
01:45:17  Finished running 1 table model in 0 hours 0 minutes and 7.18 seconds (7.18s).
01:45:17  Command end result
01:45:17  Wrote artifact WritableManifest to /Users/jeremy/git/dbt-basic/target/manifest.json
01:45:17  Wrote artifact SemanticManifest to /Users/jeremy/git/dbt-basic/target/semantic_manifest.json
01:45:17  Wrote artifact RunExecutionResult to /Users/jeremy/git/dbt-basic/target/run_results.json
01:45:17  
01:45:17  Completed successfully
01:45:17  
01:45:17  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
01:45:17  Resource report: {"command_name": "run", "command_success": true, "command_wall_clock_time": 8.062279, "process_user_time": 2.621293, "process_kernel_time": 1.596588, "process_mem_max_rss": "203014144", "process_in_blocks": "0", "process_out_blocks": "0"}
01:45:17  Command `dbt run` succeeded at 14:45:17.822465 after 8.06 seconds
01:45:17  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x106ddba90>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x102511c10>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10244ba10>]}
01:45:17  Flushing usage events
01:45:19  An error was encountered while trying to flush usage events

Do a subsequent run to make sure that this works even when the external table was created from the above build:

$ dbt --debug run
01:45:58  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x104f98b90>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x104fde7d0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x104fe6690>]}
01:45:58  Running with dbt=1.9.0-b4
01:45:58  running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'log_cache_events': 'False', 'write_json': 'True', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'version_check': 'True', 'profiles_dir': '/Users/jeremy/.dbt', 'log_path': '/Users/jeremy/git/dbt-basic/logs', 'debug': 'True', 'fail_fast': 'False', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'empty': 'False', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'static_parser': 'True', 'introspect': 'True', 'invocation_command': 'dbt --debug run', 'target_path': 'None', 'log_format': 'default', 'send_anonymous_usage_stats': 'True'}
01:45:58  Sending event: {'category': 'dbt', 'action': 'project_id', 'label': '0c780a08-b282-43f1-a66f-e81c71e661a3', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x106fa4cd0>]}
01:45:58  Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': '0c780a08-b282-43f1-a66f-e81c71e661a3', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10495b250>]}
01:45:58  Registered adapter: snowflake=1.9.0-b1
01:45:58  checksum: 0d3acddb5afbce98f9888b13248063cae3d09e6e329263e3ffe4b2a6aa072c8d, vars: {}, profile: , target: , version: 1.9.0b4
01:45:58  Partial parsing enabled: 0 files deleted, 0 files added, 0 files changed.
01:45:58  Partial parsing enabled, no changes found, skipping parsing
01:45:58  Sending event: {'category': 'dbt', 'action': 'load_project', 'label': '0c780a08-b282-43f1-a66f-e81c71e661a3', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x12bc35210>]}
01:45:58  Wrote artifact WritableManifest to /Users/jeremy/git/dbt-basic/target/manifest.json
01:45:58  Wrote artifact SemanticManifest to /Users/jeremy/git/dbt-basic/target/semantic_manifest.json
01:45:58  Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': '0c780a08-b282-43f1-a66f-e81c71e661a3', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x12bd39a10>]}
01:45:58  Found 1 model, 1 source, 518 macros
01:45:58  Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': '0c780a08-b282-43f1-a66f-e81c71e661a3', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x106fb6d10>]}
01:45:58  
01:45:58  Concurrency: 1 threads (target='sf')
01:45:58  
01:45:58  Acquiring new snowflake connection 'master'
01:45:58  Acquiring new snowflake connection 'list_db'
01:45:58  Using snowflake connection "list_db"
01:45:58  On list_db: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "all", "target_name": "sf", "connection_name": "list_db"} */
show terse schemas in database db
    limit 10000
01:45:58  Opening a new connection, currently in state init
01:46:00  SQL status: SUCCESS 2 in 2.049 seconds
01:46:00  Re-using an available connection from the pool (formerly list_db, now list_db_dbt_jyeo)
01:46:00  Using snowflake connection "list_db_dbt_jyeo"
01:46:00  On list_db_dbt_jyeo: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "all", "target_name": "sf", "connection_name": "list_db_dbt_jyeo"} */
show objects in db.dbt_jyeo limit 10000;
01:46:01  SQL status: SUCCESS 3 in 0.372 seconds
01:46:01  Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': '0c780a08-b282-43f1-a66f-e81c71e661a3', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x12b8bfe50>]}
01:46:01  Began running node model.my_dbt_project.new_tbl
01:46:01  1 of 1 START sql table model dbt_jyeo.new_tbl .................................. [RUN]
01:46:01  Re-using an available connection from the pool (formerly list_db_dbt_jyeo, now model.my_dbt_project.new_tbl)
01:46:01  Began compiling node model.my_dbt_project.new_tbl
01:46:01  Writing injected SQL for node "model.my_dbt_project.new_tbl"
01:46:01  Began executing node model.my_dbt_project.new_tbl
01:46:01  1 of 1 START external source dbt_jyeo.ext_people
01:46:01  1 of 1 (1) alter external table db.dbt_jyeo.ext_people refresh;        commit;
01:46:01  Using snowflake connection "model.my_dbt_project.new_tbl"
01:46:01  On model.my_dbt_project.new_tbl: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.new_tbl"} */
BEGIN
01:46:01  SQL status: SUCCESS 1 in 0.404 seconds
01:46:01  Using snowflake connection "model.my_dbt_project.new_tbl"
01:46:01  On model.my_dbt_project.new_tbl: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.new_tbl"} */
alter external table db.dbt_jyeo.ext_people refresh;
01:46:02  SQL status: SUCCESS 0 in 0.645 seconds
01:46:02  Using snowflake connection "model.my_dbt_project.new_tbl"
01:46:02  On model.my_dbt_project.new_tbl: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.new_tbl"} */
COMMIT
01:46:02  SQL status: SUCCESS 1 in 0.410 seconds
01:46:02  1 of 1 (1) SUCCESS 0
01:46:02  Writing runtime sql for node "model.my_dbt_project.new_tbl"
01:46:02  Using snowflake connection "model.my_dbt_project.new_tbl"
01:46:02  On model.my_dbt_project.new_tbl: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.new_tbl"} */
create or replace transient table db.dbt_jyeo.new_tbl
         as
        (

select 1 id
        );
01:46:03  SQL status: SUCCESS 1 in 1.020 seconds
01:46:03  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '0c780a08-b282-43f1-a66f-e81c71e661a3', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10313d150>]}
01:46:03  1 of 1 OK created sql table model dbt_jyeo.new_tbl ............................. [SUCCESS 1 in 2.58s]
01:46:03  Finished running node model.my_dbt_project.new_tbl
01:46:03  Connection 'master' was properly closed.
01:46:03  Connection 'model.my_dbt_project.new_tbl' was left open.
01:46:03  On model.my_dbt_project.new_tbl: Close
01:46:04  
01:46:04  Finished running 1 table model in 0 hours 0 minutes and 5.69 seconds (5.69s).
01:46:04  Command end result
01:46:04  Wrote artifact WritableManifest to /Users/jeremy/git/dbt-basic/target/manifest.json
01:46:04  Wrote artifact SemanticManifest to /Users/jeremy/git/dbt-basic/target/semantic_manifest.json
01:46:04  Wrote artifact RunExecutionResult to /Users/jeremy/git/dbt-basic/target/run_results.json
01:46:04  
01:46:04  Completed successfully
01:46:04  
01:46:04  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
01:46:04  Resource report: {"command_name": "run", "command_success": true, "command_wall_clock_time": 6.4479094, "process_user_time": 1.765473, "process_kernel_time": 2.248902, "process_mem_max_rss": "202440704", "process_in_blocks": "0", "process_out_blocks": "0"}
01:46:04  Command `dbt run` succeeded at 14:46:04.568165 after 6.45 seconds
01:46:04  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x12b896990>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10120b0d0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10120b9d0>]}
01:46:04  Flushing usage events
01:46:05  An error was encountered while trying to flush usage events

No apparent errors attempting to use the dbt_external_tables.stage_external_sources() macro in a pre-hook.

Second pass at this - it looks like hugos logs were erroring around the creation of the schema where we are writing our external table to - i.e. the schema does not yet exist - while my runs above where to a schema that already exist. Let's try with a schema that does not yet exist:

# models/sources.yml
version: 2
sources:
  - name: raw
    database: db
    schema: some_new_schema
    loader: S3
    tables:
      - name: ext_people
        external: &people
          location: "@dbt_test.dbt_external_tables_integration_tests_snowflake.dbt_external_tables_testing/csv"
          file_format: "(type = csv skip_header = 1)"
        columns: &people-cols
          - name: id
            data_type: int
          - name: first_name
            data_type: varchar(64)
          - name: last_name
            data_type: varchar(64)
          - name: email
            data_type: varchar(64)

Notice the use of some_new_schema here.

$ dbt --debug run
01:50:36  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1069c08d0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x106bd4f10>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x106bf7750>]}
01:50:36  Running with dbt=1.9.0-b4
01:50:36  running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'write_json': 'True', 'log_cache_events': 'False', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'fail_fast': 'False', 'debug': 'True', 'log_path': '/Users/jeremy/git/dbt-basic/logs', 'version_check': 'True', 'profiles_dir': '/Users/jeremy/.dbt', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'empty': 'False', 'log_format': 'default', 'invocation_command': 'dbt --debug run', 'introspect': 'True', 'static_parser': 'True', 'target_path': 'None', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'send_anonymous_usage_stats': 'True'}
01:50:37  Sending event: {'category': 'dbt', 'action': 'project_id', 'label': '9137a38c-8209-4fea-ad4c-9c26517c6153', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x105ed0a90>]}
01:50:37  Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': '9137a38c-8209-4fea-ad4c-9c26517c6153', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10656b350>]}
01:50:37  Registered adapter: snowflake=1.9.0-b1
01:50:37  checksum: 0d3acddb5afbce98f9888b13248063cae3d09e6e329263e3ffe4b2a6aa072c8d, vars: {}, profile: , target: , version: 1.9.0b4
01:50:37  Partial parsing enabled: 0 files deleted, 0 files added, 1 files changed.
01:50:37  Partial parsing: updated file: my_dbt_project://models/sources.yml
01:50:37  Sending event: {'category': 'dbt', 'action': 'load_project', 'label': '9137a38c-8209-4fea-ad4c-9c26517c6153', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x14022ead0>]}
01:50:37  Wrote artifact WritableManifest to /Users/jeremy/git/dbt-basic/target/manifest.json
01:50:37  Wrote artifact SemanticManifest to /Users/jeremy/git/dbt-basic/target/semantic_manifest.json
01:50:37  Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': '9137a38c-8209-4fea-ad4c-9c26517c6153', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1407241d0>]}
01:50:37  Found 1 model, 1 source, 518 macros
01:50:37  Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': '9137a38c-8209-4fea-ad4c-9c26517c6153', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1402ea110>]}
01:50:37  
01:50:37  Concurrency: 1 threads (target='sf')
01:50:37  
01:50:37  Acquiring new snowflake connection 'master'
01:50:37  Acquiring new snowflake connection 'list_db'
01:50:37  Using snowflake connection "list_db"
01:50:37  On list_db: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "all", "target_name": "sf", "connection_name": "list_db"} */
show terse schemas in database db
    limit 10000
01:50:37  Opening a new connection, currently in state init
01:50:39  SQL status: SUCCESS 2 in 2.399 seconds
01:50:39  Re-using an available connection from the pool (formerly list_db, now list_db_dbt_jyeo)
01:50:39  Using snowflake connection "list_db_dbt_jyeo"
01:50:39  On list_db_dbt_jyeo: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "all", "target_name": "sf", "connection_name": "list_db_dbt_jyeo"} */
show objects in db.dbt_jyeo limit 10000;
01:50:40  SQL status: SUCCESS 3 in 0.370 seconds
01:50:40  Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': '9137a38c-8209-4fea-ad4c-9c26517c6153', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x140a4dcd0>]}
01:50:40  Began running node model.my_dbt_project.new_tbl
01:50:40  1 of 1 START sql table model dbt_jyeo.new_tbl .................................. [RUN]
01:50:40  Re-using an available connection from the pool (formerly list_db_dbt_jyeo, now model.my_dbt_project.new_tbl)
01:50:40  Began compiling node model.my_dbt_project.new_tbl
01:50:40  Writing injected SQL for node "model.my_dbt_project.new_tbl"
01:50:40  Began executing node model.my_dbt_project.new_tbl
01:50:40  1 of 1 START external source some_new_schema.ext_people
01:50:40  On "model.my_dbt_project.new_tbl": cache miss for schema "db.some_new_schema", this is inefficient
01:50:40  Using snowflake connection "model.my_dbt_project.new_tbl"
01:50:40  On model.my_dbt_project.new_tbl: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.new_tbl"} */
show objects in db.some_new_schema limit 10000;
01:50:40  Snowflake adapter: Snowflake query id: 01b8880e-0906-d2be-000d-378341b81996
01:50:40  Snowflake adapter: Snowflake error: 002043 (02000): SQL compilation error:
Object does not exist, or operation cannot be performed.
01:50:40  Snowflake adapter: Error running SQL: macro list_relations_without_caching
01:50:40  Snowflake adapter: Rolling back transaction.
01:50:40  While listing relations in database=db, schema=some_new_schema, found: 
01:50:40  Using snowflake connection "model.my_dbt_project.new_tbl"
01:50:40  On model.my_dbt_project.new_tbl: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.new_tbl"} */
show terse schemas like 'some_new_schema' in database db limit 1;
01:50:41  SQL status: SUCCESS 0 in 0.420 seconds
01:50:41  1 of 1 (1) create schema if not exists                             db.some_new_schema      ...  
01:50:41  Using snowflake connection "model.my_dbt_project.new_tbl"
01:50:41  On model.my_dbt_project.new_tbl: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.new_tbl"} */
create schema if not exists 

                db.some_new_schema

        ;
01:50:41  SQL status: SUCCESS 1 in 0.384 seconds
01:50:41  1 of 1 (1) SUCCESS 1
01:50:41  1 of 1 (2) create or replace external table db.some_new_schema.ext_people(                i...  
01:50:41  Using snowflake connection "model.my_dbt_project.new_tbl"
01:50:41  On model.my_dbt_project.new_tbl: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.new_tbl"} */
create or replace external table db.some_new_schema.ext_people(
                id int as ((case when is_null_value(value:c1) or lower(value:c1) = 'null' then null else value:c1 end)::int),
                first_name varchar(64) as ((case when is_null_value(value:c2) or lower(value:c2) = 'null' then null else value:c2 end)::varchar(64)),
                last_name varchar(64) as ((case when is_null_value(value:c3) or lower(value:c3) = 'null' then null else value:c3 end)::varchar(64)),
                email varchar(64) as ((case when is_null_value(value:c4) or lower(value:c4) = 'null' then null else value:c4 end)::varchar(64))
        )
    location = @dbt_test.dbt_external_tables_integration_tests_snowflake.dbt_external_tables_testing/csv 

    file_format = (type = csv skip_header = 1)

;
01:50:43  SQL status: SUCCESS 1 in 1.704 seconds
01:50:43  1 of 1 (2) SUCCESS 1
01:50:43  Writing runtime sql for node "model.my_dbt_project.new_tbl"
01:50:43  Using snowflake connection "model.my_dbt_project.new_tbl"
01:50:43  On model.my_dbt_project.new_tbl: /* {"app": "dbt", "dbt_version": "1.9.0b4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.new_tbl"} */
create or replace transient table db.dbt_jyeo.new_tbl
         as
        (

select 1 id
        );
01:50:44  SQL status: SUCCESS 1 in 1.233 seconds
01:50:44  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '9137a38c-8209-4fea-ad4c-9c26517c6153', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x140d3f110>]}
01:50:44  1 of 1 OK created sql table model dbt_jyeo.new_tbl ............................. [SUCCESS 1 in 4.34s]
01:50:44  Finished running node model.my_dbt_project.new_tbl
01:50:44  Connection 'master' was properly closed.
01:50:44  Connection 'model.my_dbt_project.new_tbl' was left open.
01:50:44  On model.my_dbt_project.new_tbl: Close
01:50:45  
01:50:45  Finished running 1 table model in 0 hours 0 minutes and 7.79 seconds (7.79s).
01:50:45  Command end result
01:50:45  Wrote artifact WritableManifest to /Users/jeremy/git/dbt-basic/target/manifest.json
01:50:45  Wrote artifact SemanticManifest to /Users/jeremy/git/dbt-basic/target/semantic_manifest.json
01:50:45  Wrote artifact RunExecutionResult to /Users/jeremy/git/dbt-basic/target/run_results.json
01:50:45  
01:50:45  Completed successfully
01:50:45  
01:50:45  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
01:50:45  Resource report: {"command_name": "run", "command_success": true, "command_wall_clock_time": 8.64233, "process_user_time": 2.656455, "process_kernel_time": 1.653997, "process_mem_max_rss": "207814656", "process_in_blocks": "0", "process_out_blocks": "0"}
01:50:45  Command `dbt run` succeeded at 14:50:45.285703 after 8.64 seconds
01:50:45  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x106c27ed0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x106c273d0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x102ee1c10>]}
01:50:45  Flushing usage events
01:50:46  An error was encountered while trying to flush usage events

Seems okay.

jeremyyeo commented 10 hours ago

Initial take: there is actually a legitimate compilation error with hugos model XXX.sql - to truly isolate - make sure the project is clean of all customizations (i.e. remove all models, macros, overrides, all other pacakges, etc) and basically just have a single toy model + single source like the example above demonstrates.

hugoerolex commented 2 hours ago

Hi @jeremyyeo sorry for the time lost, we isolated with the support that it was a bug of partial parsing. The issue isn't reproducing when cleaning the target folder. Sorry for the noise