duckdb / dbt-duckdb

dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)
Apache License 2.0
793 stars 70 forks source link

Unable to connect to duckdb database residing on S3 #249

Open fRoDdYy opened 9 months ago

fRoDdYy commented 9 months ago

Referring to the documentation when I am trying to connect to my duckdb database by attaching it

outputs: dev: type: duckdb path: tmp/dbt.duckdb extensions:

  • httpfs
  • parquet settings: s3_region: ap-south-1 s3_access_key_id: "{{ env_var('AWS_ACCESS_KEY_ID') }}" s3_secret_access_key: "{{ env_var('AWS_SECRET_ACCESS_KEY') }}" s3_session_token: "{{ env_var('AWS_SESSION_TOKEN') }}" attach:
  • path: "s3://bucket-name/file-path/database.duckdb read_only: true threads: 1 external_root: s3://bucket-name/output target: dev

I get the following error:

Runtime Error Catalog Error: Cannot open database "s3://bucket-name/file-path/database.duckdb" in read-only mode: database does not exist

jwills commented 9 months ago

Huh; the path in the attach statement doesn't match the path in the error and it's not totally obvious to me why-- will take a look.

Mause commented 9 months ago

The s3 attach via httpfs story is a bit of a strange one right now - I think you can only set auth via environment variables or query parameters

jwills commented 9 months ago

ack, thank you @Mause! I swear I've made it work before, let me see if I can sort out how.

fRoDdYy commented 9 months ago

@Mause I tried it with environment variable too but the issue persists.

@jwills Yeah sorry I made that mistake while typing have corrected it though.

Mause commented 9 months ago

Maybe double check the environment variable names? https://duckdb.org/docs/extensions/httpfs#configuration-1

NatElkins commented 5 months ago

I am also interested in this use case. I'm trying to perform a query on a DuckDB database in S3. I'm pretty sure my env vars are correct because I'm able to run a query like:

with t as (
    SELECT *
    FROM iceberg_scan('s3a://path/to/files', allow_moved_paths = true)
)
SELECT *
from t;

without issue.

I've set my env vars like this:

SET s3_region = 'us-east-1';
SET s3_access_key_id = 'access_key_id';
SET s3_secret_access_key = 'secret_access_key';

and I've also executed the following:

INSTALL httpfs;
INSTALL iceberg;

LOAD httpfs;
LOAD iceberg;

(Iceberg not related directly to this problem, just noting it).

I also get the error:

Catalog Error: Cannot open database "s3://path/to/test.duckdb" in read-only mode: database does not exist

I know my repro isn't using dbt-duckdb, but I'm also interested in this use case with dbt-duckdb (was just testing with SQL to verify behavior outside of dbt-duckdb).

jwills commented 5 months ago

Ah appreciate that @NatElkins -- I wonder if it works if you use the fsspec stuff? So as to treat S3 like a filesystem as opposed to trying to use the httpfs route?

NatElkins commented 5 months ago

@jwills It doesn't seem to work, although I may be doing something wrong. I have the creds defined in my ~/.aws/credentials file.

import duckdb
from fsspec import filesystem

duckdb.register_filesystem(filesystem('s3', anon=False))
duckdb.connect("s3://bucket/path/to/test.duckdb'")
r1 = duckdb.sql("select * from test")
print(r1)

The error I get is:

duckdb.connect("s3://bucket/path/to/test.duckdb")
duckdb.duckdb.IOException: IO Error: Cannot open file "/Users/nathanielelkins/Projects/dbt_test/s3://bucket/path/to/test.duckdb": No such file or directory
jwills commented 5 months ago

Yep, right there with you-- I cannot figure out how to get this to work right now. 😞

AldricVS commented 1 month ago

I stumbled across this thread when searching for the same issue and I managed to make this working using the Secret provider system (see the S3 API Support Page).

The issue seems to be to set credentials using statements like SET s3_endpoint = ....

On the 0.10.3 version, when creating a secret like :

INSTALL httpfs; 
LOAD httpfs;
CREATE SECRET secret1 (
        TYPE S3,
        ENDPOINT '***',
        KEY_ID '****',
        SECRET '****',
        REGION '****'
);

then performing the "attach" statement :

ATTACH 's3://path/to/dbfile'
AS db (READ_ONLY)

it worked perfectly