duckdb / duckdb_delta

DuckDB extension for Delta Lake
MIT License
134 stars 14 forks source link

Unable to read delta table using SAS token of ADLS Gen2 #101

Open BharatSingla12 opened 3 weeks ago

BharatSingla12 commented 3 weeks ago

Hi, I am trying to use an ADLS Gen2 SAS token to read and query a Delta table. I generated a folder-level SAS token. Now, I am attempting to access the Delta table using the following code snippet:

from adlfs.spec import AzureBlobFileSystem
import duckdb

file_system= AzureBlobFileSystem(account_name = storage_account_name, sas_token=sas_token)

connection = duckdb.connect()
connection.register_filesystem(file_system)

query = connection.sql('''
  SELECT * FROM delta_scan('abfss://blogo-container/Test/light_delta_lake')
''')

It giving me following error: IOException: IO Error: Hit DeltaKernel FFI error (from: While trying to read from delta table: 'abfss://blogo-container/Test/light_delta_lake/'): Hit error: 8 (ObjectStoreError) with message (Error interacting with object store: Generic MicrosoftAzure error: Account must be specified)

However, if I try to read an individual Parquet file within the folder, it works for me: query = connection.sql(""" SELECT * FROM read_parquet('abfs://blogo-container/Test/light_delta_lake/part-00001-925210c8-29c5-40e2-86d8-c41ef2022bf9-c000.snappy.parquet')""")

samansmink commented 1 day ago

Hey @BharatSingla12 thanks for reporting this.

The delta extension currently does not work with fsspec in DuckDB. This is expected behaviour with our current implementation. This issue will be fixed when the delta kernel allows delegating all IO to DuckDB.

You should be able to use the azure extension instead to query delta tables on azure, see https://github.com/duckdb/duckdb_delta/tree/main/test/sql/cloud/azure for examples