duckdb / dbt-duckdb

dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)
Apache License 2.0
816 stars 69 forks source link

external materialization doesn't work #200

Closed mutkach closed 1 year ago

mutkach commented 1 year ago

Hi. Everything works perfectly until I try to materialize to 'external' as explained in readme. Like this:

{{ config(materialized='external', location='local/directory/file.parquet') }}  
select * from {{ ref('read_parquet') }} limit 100000

Neither 'csv' nor 'json' or inferred works. Am I doing something wrong? Do I need any specific setup in duckdb profile or dbt_project.yml? Thanks in advance, hope it's not some stupid rookie mistake :|

edit: further info

models:
  eda_tda:
    +materialized: table
    +materialized: external
eda_tda:
  outputs:
   dev:
     type: duckdb
     path: /tmp/dbt.duckdb
     extensions:
       - httpfs
       - spatial
       - parquet
  target: dev

Here's log:

00:09:10.899008 [debug] [Thread-1 (]: Using duckdb connection "model.eda_tda.filter_spatial"
00:09:10.899248 [debug] [Thread-1 (]: On model.eda_tda.filter_spatial: BEGIN
00:09:10.899474 [debug] [Thread-1 (]: Opening a new connection, currently in state closed
00:09:10.951654 [debug] [Thread-1 (]: SQL status: OK in 0.0 seconds
00:09:10.952061 [debug] [Thread-1 (]: Using duckdb connection "model.eda_tda.filter_spatial"
00:09:10.952299 [debug] [Thread-1 (]: On model.eda_tda.filter_spatial: /* {"app": "dbt", "dbt_version": "1.5.2", "profile_name": "eda_tda", "target_name": "dev", "node_id": "model.eda_tda.filter_spatial"} */

    create  table
      "dbt"."main"."filter_spatial__dbt_tmp"

    as (

select * from "dbt"."main"."read_parquet" limit 100000

    );
00:09:11.004147 [debug] [Thread-1 (]: SQL status: OK in 0.0 seconds
00:09:11.006337 [debug] [Thread-1 (]: Using duckdb connection "model.eda_tda.filter_spatial"
00:09:11.006588 [debug] [Thread-1 (]: On model.eda_tda.filter_spatial: /* {"app": "dbt", "dbt_version": "1.5.2", "profile_name": "eda_tda", "target_name": "dev", "node_id": "model.eda_tda.filter_spatial"} */
alter table "dbt"."main"."filter_spatial" rename to "filter_spatial__dbt_backup"
00:09:11.007048 [debug] [Thread-1 (]: SQL status: OK in 0.0 seconds
00:09:11.008864 [debug] [Thread-1 (]: Using duckdb connection "model.eda_tda.filter_spatial"
00:09:11.009106 [debug] [Thread-1 (]: On model.eda_tda.filter_spatial: /* {"app": "dbt", "dbt_version": "1.5.2", "profile_name": "eda_tda", "target_name": "dev", "node_id": "model.eda_tda.filter_spatial"} */
alter table "dbt"."main"."filter_spatial__dbt_tmp" rename to "filter_spatial"
00:09:11.009454 [debug] [Thread-1 (]: SQL status: OK in 0.0 seconds
00:09:11.010929 [debug] [Thread-1 (]: On model.eda_tda.filter_spatial: COMMIT
00:09:11.011185 [debug] [Thread-1 (]: Using duckdb connection "model.eda_tda.filter_spatial"
00:09:11.011417 [debug] [Thread-1 (]: On model.eda_tda.filter_spatial: COMMIT
00:09:11.068619 [debug] [Thread-1 (]: SQL status: OK in 0.0 seconds
00:09:11.073010 [debug] [Thread-1 (]: Using duckdb connection "model.eda_tda.filter_spatial"
00:09:11.073277 [debug] [Thread-1 (]: On model.eda_tda.filter_spatial: /* {"app": "dbt", "dbt_version": "1.5.2", "profile_name": "eda_tda", "target_name": "dev", "node_id": "model.eda_tda.filter_spatial"} */
drop table if exists "dbt"."main"."filter_spatial__dbt_backup" cascade
00:09:11.079175 [debug] [Thread-1 (]: SQL status: OK in 0.0 seconds
00:09:11.080513 [debug] [Thread-1 (]: Timing info for model.eda_tda.filter_spatial (execute): 00:09:10.895957 => 00:09:11.080375
00:09:11.080753 [debug] [Thread-1 (]: On model.eda_tda.filter_spatial: Close
00:09:11.416077 [debug] [Thread-1 (]: Sending event: {'category': 'dbt', 'action': 'run_model', 'label': 'd972b3c1-a3bf-4171-9447-64a94274a0b0', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f9ce57dff10>]}
00:09:11.416682 [info ] [Thread-1 (]: 3 of 3 OK created sql table model main.filter_spatial .......................... [OK in 0.52s]
00:09:11.417177 [debug] [Thread-1 (]: Finished running node model.eda_tda.filter_spatial
mutkach commented 1 year ago

Cleaned and restarted and somehow it's fixed now

jwills commented 1 year ago

Mmm-- I think the dbt_project.yml should only have one +materialized line, so like:

models:
  eda_tda:
    +materialized: external

not sure what would happen if you have both in there, it might be non-deterministic

mutkach commented 1 year ago

Thanks for looking into it. Just for the record, though, it is not the case actually, as I still have multiple materialized lines in dbt_project.yml