duckdb / duckdb_delta

DuckDB extension for Delta Lake
MIT License
88 stars 8 forks source link

Error interacting with object store: Generic S3 error: Missing region #14

Open mvilrokx opened 1 month ago

mvilrokx commented 1 month ago

I've create a very simple, small Delta Table on my minio instance. Whatever I do to query it, I get the following error:

IO Error: Hit DeltaKernel FFI error (from: get_default_client in DeltaScanScanBind): 
Hit error: 5 (GenericError) with message (Generic delta kernel error: Error interacting with object store: 
Generic S3 error: Missing region)

The table was created from the following parquet file (which is located in the same minio instance and I can query fine with duckdb):

D SELECT * FROM 's3://vilrokx/prices.parquet';
┌─────────┬─────────────────────┬───────┐
│ ticker  │        when         │ price │
│ varchar │      timestamp      │ int64 │
├─────────┼─────────────────────┼───────┤
│ APPL    │ 2001-01-01 00:00:00 │     1 │
│ APPL    │ 2001-01-01 00:01:00 │     2 │
│ APPL    │ 2001-01-01 00:02:00 │     3 │
│ MSFT    │ 2001-01-01 00:00:00 │     1 │
│ MSFT    │ 2001-01-01 00:01:00 │     2 │
│ MSFT    │ 2001-01-01 00:02:00 │     3 │
│ GOOG    │ 2001-01-01 00:00:00 │     1 │
│ GOOG    │ 2001-01-01 00:01:00 │     2 │
│ GOOG    │ 2001-01-01 00:02:00 │     3 │
└─────────┴─────────────────────┴───────┘

Tried:

D SET s3_endpoint='127.0.0.1:9000';
D SET s3_use_ssl=false;
D SET s3_url_style='path';
D SET s3_region='us-east-1';
D SELECT * FROM delta_scan('s3://delta-lake/prices_table');
IO Error: Hit DeltaKernel FFI error (from: get_default_client in DeltaScanScanBind): Hit error: 5 (GenericError) with message (Generic delta kernel error: Error interacting with object store: Generic S3 error: Missing region)
D SELECT ticker FROM delta_scan('s3://delta-lake/prices_table');
IO Error: Hit DeltaKernel FFI error (from: get_default_client in DeltaScanScanBind): Hit error: 5 (GenericError) with message (Generic delta kernel error: Error interacting with object store: Generic S3 error: Missing region)

I am setting the region, as you can see, even though this should not be needed for a local Minio instance I would assume.

When I do the same with a Delta table on my local file system I get a different error:

D SELECT * FROM delta_scan('file:///prices_table') limit 1;
IO Error: Hit DeltaKernel FFI error (from: snapshot in DeltaScanScanBind): Hit error: 15 (MissingVersionError) with message (No table version found.)
mvilrokx commented 1 month ago

FYI, this works perfectly fine on the same Delta Table:

code:

from deltalake import DeltaTable

table_path = "prices_table"

df = DeltaTable(table_path).to_pandas()

print(df)

CLI:

❯ python read-delta-table.py
  ticker                when  price
0   APPL 2001-01-01 00:00:00      1
1   APPL 2001-01-01 00:01:00      2
2   APPL 2001-01-01 00:02:00      3
3   MSFT 2001-01-01 00:00:00      1
4   MSFT 2001-01-01 00:01:00      2
5   MSFT 2001-01-01 00:02:00      3
6   GOOG 2001-01-01 00:00:00      1
7   GOOG 2001-01-01 00:01:00      2
8   GOOG 2001-01-01 00:02:00      3
marsupialtail commented 1 month ago

try this: CREATE SECRET (TYPE S3, provider credential_chain);

samansmink commented 1 month ago

Yea the delta extension only works with DuckDB's secrets. So if you have a non-default region you will need to create an S3 secret with the correct region. @marsupialtail's solution will work if you have an s3 config in a standard location with your region, otherwise you can use:

CREATE SECRET (
    TYPE S3,
    REGION 'my-region'
);
FROM delta_scan('....');
mvilrokx commented 1 month ago

What about all the other settings?

D SET s3_endpoint='127.0.0.1:9000';
D SET s3_use_ssl=false;
D SET s3_url_style='path';

When I set the secret, as per @samansmink's comment, I get:

IO Error: Hit DeltaKernel FFI error (from: snapshot in DeltaScanScanBind): Hit error: 8 (ObjectStoreError) with message (Error interacting with object store: Generic S3 error: Error after 10 retries in 2.227643s, max_retries:10, retry_timeout:180s, source:error sending request for url (https://s3.my-region.amazonaws.com/delta-lake/prices_table/_delta_log/_last_checkpoint): error trying to connect: dns error: failed to lookup address information: nodename nor servname provided, or not known)

which seems to suggest that it is not picking up the s3_endpoint I SET.

samansmink commented 1 month ago

Yea no not yet, I'm only passing region and credentials atm, the rest should be easy to add (see src)

I will try to get to this soonish

mvilrokx commented 1 month ago

Yea no not yet, I'm only passing region and credentials atm, the rest should be easy to add (see src)

Yeah that would be needed if you want to support e.g. minio