duckdb / duckdb_delta

DuckDB extension for Delta Lake
MIT License
139 stars 15 forks source link

local minio s3 endpoint not used by delta extension #53

Closed mervynzhang closed 4 months ago

mervynzhang commented 4 months ago

Hello there,

I created a secret for local minio, but delta extension doesn't use the endpoint, only region is used, can anyone help to troubleshoot? Or is this a bug?


D CREATE SECRET s1 (
      TYPE S3,
      USE_SSL 'false',
      URL_STYLE 'path',
      ENDPOINT '10.10.15.235:9000',
      REGION 'm1',
      KEY_ID 'minio',
      SECRET 'CHANGEME123');
┌─────────┐
│ Success │
│ boolean │
├─────────┤
│ true    │
└─────────┘
D SELECT * FROM delta_scan('s3://alluxio/logs/dt2') limit 1;
IO Error: Hit DeltaKernel FFI error (from: While trying to read from delta table: 's3://alluxio/logs/dt2/'): Hit error: 8 (ObjectStoreError) with message (Error interacting with object store: Generic S3 error: Error after 10 retries in 2.086870444s, max_retries:10, retry_timeout:180s, source:error sending request for url (https://s3.m1.amazonaws.com/alluxio/logs/dt2/_delta_log/_last_checkpoint))
mervynzhang commented 4 months ago

From https://github.com/duckdb/duckdb_delta/commit/3e33b4967eac2cb4e6725eac52336fe0e319de59, minio is supported, maybe I not using the latest version?

samansmink commented 4 months ago

could you run update extensions in duckdb and report which version of delta you are on?

yaguirre commented 4 months ago

I was about to raise an issue for the same problem, I already setup SECRET as documented: https://duckdb.org/docs/extensions/delta, I decided to explicitly put tokens to make sure it wasn't another issue.

CREATE SECRET delta_s4 (
    TYPE S3,
    KEY_ID '<key_id>', 
    SECRET '<secret>',
    REGION 'us-east-1', 
    SCOPE 's3://sample-delta-lake'
)

This is the status of secret after executing: FROM which_secret('s3://sample-delta-lake/delta_s3', 's3')

┌──────────┬────────────┬─────────┐
│   name   │ persistent │ storage │
│ varchar  │  varchar   │ varchar │
├──────────┼────────────┼─────────┤
│ delta_s4 │ TEMPORARY  │ memory  │
└──────────┴────────────┴─────────┘

Now, whenever I try to run SELECT * FROM delta_scan('s3://sample-delta-lake/delta_s3') I'm getting the same error as @mervynzhang

IOException: IO Error: Hit DeltaKernel FFI error (from: While trying to read from delta table: 's3://sample-delta-lake/delta_s3/'): Hit error: 8 (ObjectStoreError) with message (Error interacting with object store: Generic S3 error: Error after 10 retries in 59.883701368s, max_retries:10, retry_timeout:180s, source:error sending request for url (https://s3.us-east-1.amazonaws.com/sample-delta-lake/delta_s3/_delta_log/_last_checkpoint))

I did try running update extensions as you recommend and this is the output I'm getting for that command

┌────────────────┬────────────┬─────────────────────┬──────────────────┬─────────────────┐
│ extension_name │ repository │    update_result    │ previous_version │ current_version │
│    varchar     │  varchar   │       varchar       │     varchar      │     varchar     │
├────────────────┼────────────┼─────────────────────┼──────────────────┼─────────────────┤
│ aws            │ core       │ NO_UPDATE_AVAILABLE │ 42c78d3          │ 42c78d3         │
│ httpfs         │ core       │ NO_UPDATE_AVAILABLE │ 1f98600c2c       │ 1f98600c2c      │
│ delta          │ core       │ NO_UPDATE_AVAILABLE │ e526d8f          │ e526d8f         │
└────────────────┴────────────┴─────────────────────┴──────────────────┴─────────────────┘

However, I'm still getting the same error

mervynzhang commented 4 months ago

could you run update extensions in duckdb and report which version of delta you are on?

v0.1.0 works now. Thanks

samansmink commented 4 months ago

@yaguirre I'm going to assume this works for you now as well if you update to v0.1.0 by running update extensions again. If not, please let me know!

yaguirre commented 3 months ago

Hi @samansmink I executed the update extensions again and now I'm using version v0.1.0 as shown below.

│ delta │ true │ true │ … │ v0.1.0 │ REPOSITORY │ core │

However, I'm still getting the same error, not sure if I'm missing to configure any additional parameter

samansmink commented 3 months ago

@yaguirre note that the error could arise also from invalid config. If you are sure your config should be correct, please open a new issue with a reproduction and I will take a look!

yaguirre commented 3 months ago

@samansmink I tried running delta_scan with same configuration but this time within a GitHub Codespaces machine and it acually worked there, maybe it has something to do with my local system environment. I'm running a WSL 2 system with following characteristics.

Linux version 5.15.153.1-microsoft-standard-WSL2 (root@941d701f84f1) (gcc (GCC) 11.2.0, GNU ld (GNU Binutils) 2.37) #1 SMP Fri Mar 29 23:14:13 UTC 2024

As you see that is a linux_amd64 compatible system, so as of my understanding from supported platforms shouldn't have any issues. The GitHub Codespaces machine where it worked has these characteristics.

Linux version 6.5.0-1022-azure (buildd@lcy02-amd64-015) (x86_64-linux-gnu-gcc-11 (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #23~22.04.1-Ubuntu SMP Thu May 9 17:59:24 UTC 2024

Maybe the first one has still some kind of issues, just guessing, but anyways it's working now! Thanks for your support!! 😄

samansmink commented 3 months ago

@yaguirre Im not sure how I would go about reproducing that easily. Given that it now seems to work for the most part I will keep this closed. Feel free to reopen an issue if you keep having problems!