duckdb / dbt-duckdb

dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)
Apache License 2.0
790 stars 70 forks source link

external location in copy with filesystem not works #317

Closed ZergRocks closed 6 months ago

ZergRocks commented 6 months ago
import duckdb
import fsspec

fs = fsspec.filesystem('jupyter', url='{jupyter lab endpoint}', tok='{jupyter lab token}')
conn = duckdb.connect(":memory:")
conn.register_filesystem(fs)
conn.sql(f"""
  copy(
  SELECT *
  FROM 'jupyter://work/1-12.parquet'
  ) to 'jupyter://1-13.parquet' (format 'parquet')
""")

The above copy query works well in my Jupiter Lab and processes the parquet in the folder to complete the writing well outside the folder

However, the external.sql macro in dbt-duckdb only reads well, and writes are done by creating a folder structure called jupiter: in my local file system

Maybe the maintenance processing for the fsspec session was not considered in the phrase that read and write create and insert and write by temp table, but I couldn't find the exact reason for it at this time.

I was wondering if you could tell me how to solve this part or if you would like to reflect it in PR in the future

also same issue arrised with sftp filesystem(I tried)

# profiles.yml
dbt_duckdb:
  outputs:
    dev:
      type: duckdb
      path: ':memory:'
      filesystems:
        - fs: jupyter
          url: '{jupyter lab endpoint}'
          tok: '{jupyter lab endpoint}'
      plugins:
        - module: delta
  target: dev

# dbt_project.yml
name: 'dbt_duckdb'
version: '1.0.0'
config-version: 2
profile: 'dbt_duckdb_delta'
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
clean-targets:
  - "target"
  - "dbt_packages"
models:
  dbt_duckdb_delta:
    delta-external:
      +materialized: 'external_table'
ZergRocks commented 6 months ago

@jwills could you read this if you have time sir?

jwills commented 6 months ago

@ZergRocks happy new year + two questions:

1) In the dbt_project.yml the materialization type you have is external_table but I think you meant external? I don't have an external_table materialization defined, I assumed that was a typo. 2) What location are you using for the models that you are trying to materialize externally?

jwills commented 6 months ago

(There's a jupyter (the fsspec impl) vs. jupiter (the file location on the local file system) potential typo in your description of the problem that made me wonder if the location field had an issue)

ZergRocks commented 6 months ago

changed from external_table to external and set proper location works.. sorry for making confusion 🙇

I just tried to use this branch https://github.com/duckdb/dbt-duckdb/pull/284 wtih jupyter fs and had experienced not work what I expected

now I know I made wrong config to rollback and external & jupyter option works well and have no problem

thanks for checking and sorry again 🙇 @jwills

milicevica23 commented 6 months ago

Hi @ZergRocks, sorry for confusion here, in my branch i made a new materialization "external_table" in order to learn how it has to be done, because i wanted to showcase the delta plugin. My end goal is to incorporate delta plagin in materialization "external" which is used by all other use cases