This is the experimental DuckDB extension for Delta. It is built using the (also experimental) Delta Kernel. The extension (currently) offers read support for delta tables, both local and remote.
The supported platforms are:
linux_amd64
and linux_amd64_gcc4
and linux_arm64
osx_amd64
and osx_arm64
windows_amd64
Support for the other DuckDB platforms is work-in-progress
[!NOTE] This extension requires the DuckDB v0.10.3 or higher
This extension is distributed as a binary extension. To use it, simply use one of its functions from DuckDB and the extension will be autoloaded:
FROM delta_scan('s3://some/delta/table');
To scan a local table, use the full path prefixes with file://
FROM delta_scan('file:///some/path/on/local/machine');
Note that using DuckDB Secrets for Cloud authentication is supported.
CREATE SECRET (
TYPE S3,
PROVIDER CREDENTIAL_CHAIN
);
FROM delta_scan('s3://some/delta/table/with/auth');
CREATE SECRET (
TYPE AZURE,
PROVIDER CREDENTIAL_CHAIN,
CHAIN 'cli',
ACCOUNT_NAME 'mystorageaccount'
);
FROM delta_scan('abfss://some/delta/table/with/auth');
https://duckdb.org/docs/guides/network_cloud_storage/gcs_import.html You need to create HMAC keys and declare a secret.
CREATE SECRET (
TYPE GCS,
KEY_ID 'xxxx',
SECRET 'yyy'
);
While still experimental, many (scanning) features/optimizations are already supported in this extension as it reuses most of DuckDB's regular parquet scanning logic:
More features coming soon!
See the Extension Template for generic build instructions
There are various tests available for the delta extension:
/test/sql/dat
/test/sql/delta_kernel_rs
tests/sql/generated
(generated using delta-rs, PySpark, and DuckDB)To run the first 2 sets of tests:
make test_debug
or in release mode
make test
To also run the tests on generated data:
make generate-data
GENERATED_DATA_AVAILABLE=1 make test