duckdb / dbt-duckdb

dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)
Apache License 2.0
790 stars 70 forks source link

Support for blob storage/azure? #318

Closed nfoerster closed 5 months ago

nfoerster commented 5 months ago

Does this only work for s3 or also with azure(abfs)?

sources:
  - name: external_source
    meta:
      external_location: "s3://my-bucket/my-sources/{name}.parquet"
    tables:
      - name: source1
      - name: source2

Both don't work although connection string and accountname/accountkey:

sources:
  - name: test
    meta:
      external_location: "azure://test/{name}/datetime.parquet"
    tables:
      - name: time
sources:
  - name: test
    meta:
      external_location: "abfs://test/{name}/datetime.parquet"
    tables:
      - name: time
jwills commented 5 months ago

They should work, that's just a string substitution-- did you configure the azure extension in your profiles.yml? https://duckdb.org/docs/extensions/azure.html

jwills commented 5 months ago

hey @nfoerster I deleted that comment cause it had account info on it, hope that was right

nfoerster commented 5 months ago

hey @nfoerster I deleted that comment cause it had account info on it, hope that was right

Oh, no it was just azurite dummy string, so nothing sensitive.

jwills commented 5 months ago

ah good, sorry about that--- hrm, as you said, everything looks fine with your config basically right-- did you try putting double quotes around the azure_storage_connection_string property? I'm wondering if it's getting mangled somehow by the yaml parser

nfoerster commented 5 months ago

unfortunately still the same issue, is there anything else problematic? I found almost no examples using the AZ blobstore as external table location in the WWW, and I'm also struggling to find the possible parameters.

test:
  outputs:
    dev:
      type: duckdb
      path: '/tmp/dbt/dev.duckdb'
      packages-install-path: '/tmp/dbt/dbt_packages'
      extensions: 
        - azure
        - parquet
      plugins: 
        - module: delta
      settings:
        azure_storage_connection_string: "AccountName=devstoreaccount1;AccountKey=REDACTED;DefaultEndpointsProtocol=http;BlobEndpoint=http://127.0.0.1:10000/devstoreaccount1;QueueEndpoint=http://127.0.0.1:10001/devstoreaccount1;TableEndpoint=http://127.0.0.1:10002/devstoreaccount1;"
  target: dev

sources.yaml:

version: 2

sources:
  - name: azblob
    meta:
      external_location: "azure://test/{name}/*.parquet"
    tables:
      - name: time

dbt run -t dev

08:19:48  Running with dbt=1.7.4
08:19:48  Registered adapter: duckdb=1.7.0
08:19:48  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.test.dev
08:19:48  Found 1 model, 2 sources, 0 exposures, 0 metrics, 391 macros, 0 groups, 0 semantic models
08:19:48  
08:19:48  Concurrency: 1 threads (target='dev')
08:19:48  
08:19:48  1 of 1 START sql view model main.vw_dim_time ................................... [RUN]
08:19:48  Unhandled error while executing 
Generic MicrosoftAzure error: Account must be specified
08:19:48  1 of 1 ERROR creating sql view model main.vw_dim_time .......................... [ERROR in 0.03s]
08:19:48  
08:19:48  Finished running 1 view model in 0 hours 0 minutes and 0.39 seconds (0.39s).
08:19:48  
08:19:48  Completed with 1 error and 0 warnings:
08:19:48  
08:19:48    Generic MicrosoftAzure error: Account must be specified
08:19:48  
08:19:48  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
nfoerster commented 5 months ago

I think it works now after I changed the name of source and view accordingly.