Using default Azure/AWS credentials leads to error when using delta table format on filesystem destination. Works fine when using e.g. parquet instead.
Default Azure/AWS credentials are handled properly and can be used to authenticate.
Steps to reproduce
For Azure:
import os
import dlt
# set dlt env vars
os.environ["CREDENTIALS__AZURE_STORAGE_ACCOUNT_NAME"] = "dltdata"
os.environ["BUCKET_URL"] = "az://dlt-ci-test-bucket"
# set default Azure credentials
os.environ["AZURE_TENANT_ID"] = "MY_TENANT_ID"
os.environ["AZURE_CLIENT_ID"] ="MY_CLIENT_ID"
os.environ["AZURE_CLIENT_SECRET"] = "MY_CLIENT_SECRET"
pipe = dlt.pipeline("my_pipe", destination="filesystem")
pipe.run([{"foo": 1}], table_name="my_table", table_format="delta")
Traceback:
2024-11-13 10:28:18,585|[ERROR]|66899|140167369590464|dlt|reference.py|run_managed:431|Transient exception in job my_table.50fe02e280.reference in file /home/j/.dlt/pipelines/my_pipe/load/normalized/1731479294.7018855/started_jobs/my_table.50fe02e280.0.reference
Traceback (most recent call last):
File "/home/j/repos/dlt/dlt/common/destination/reference.py", line 422, in run_managed
self.run()
File "/home/j/repos/dlt/dlt/destinations/impl/filesystem/filesystem.py", line 143, in run
delta_table = self._delta_table()
File "/home/j/repos/dlt/dlt/destinations/impl/filesystem/filesystem.py", line 187, in _delta_table
if DeltaTable.is_deltatable(self.make_remote_url(), storage_options=self._storage_options):
File "/home/j/.cache/pypoetry/virtualenvs/dlt-2tG_aB2A-py3.9/lib/python3.9/site-packages/deltalake/table.py", line 436, in is_deltatable
return RawDeltaTable.is_deltatable(table_uri, storage_options)
TypeError: argument 'storage_options': 'bool' object cannot be converted to 'PyString'
2024-11-13 10:28:18,586|[WARNING]|66899|140167860366208|dlt|load.py|complete_jobs:430|Job for my_table.50fe02e280.reference retried in load 1731479294.7018855 with message argument 'storage_options': 'bool' object cannot be converted to 'PyString'
2024-11-13 10:28:18,587|[ERROR]|66899|140167369590464|dlt|reference.py|run_managed:431|Transient exception in job my_table.50fe02e280.reference in file /home/j/.dlt/pipelines/my_pipe/load/normalized/1731479294.7018855/started_jobs/my_table.50fe02e280.1.reference
Traceback (most recent call last):
File "/home/j/repos/dlt/dlt/common/destination/reference.py", line 422, in run_managed
self.run()
File "/home/j/repos/dlt/dlt/destinations/impl/filesystem/filesystem.py", line 143, in run
delta_table = self._delta_table()
File "/home/j/repos/dlt/dlt/destinations/impl/filesystem/filesystem.py", line 187, in _delta_table
if DeltaTable.is_deltatable(self.make_remote_url(), storage_options=self._storage_options):
File "/home/j/.cache/pypoetry/virtualenvs/dlt-2tG_aB2A-py3.9/lib/python3.9/site-packages/deltalake/table.py", line 436, in is_deltatable
return RawDeltaTable.is_deltatable(table_uri, storage_options)
TypeError: argument 'storage_options': 'bool' object cannot be converted to 'PyString'
2024-11-13 10:28:18,587|[WARNING]|66899|140167860366208|dlt|load.py|complete_jobs:430|Job for my_table.50fe02e280.reference retried in load 1731479294.7018855 with message argument 'storage_options': 'bool' object cannot be converted to 'PyString'
2024-11-13 10:28:18,589|[ERROR]|66899|140167369590464|dlt|reference.py|run_managed:431|Transient exception in job my_table.50fe02e280.reference in file /home/j/.dlt/pipelines/my_pipe/load/normalized/1731479294.7018855/started_jobs/my_table.50fe02e280.2.reference
Traceback (most recent call last):
File "/home/j/repos/dlt/dlt/common/destination/reference.py", line 422, in run_managed
self.run()
File "/home/j/repos/dlt/dlt/destinations/impl/filesystem/filesystem.py", line 143, in run
delta_table = self._delta_table()
File "/home/j/repos/dlt/dlt/destinations/impl/filesystem/filesystem.py", line 187, in _delta_table
if DeltaTable.is_deltatable(self.make_remote_url(), storage_options=self._storage_options):
File "/home/j/.cache/pypoetry/virtualenvs/dlt-2tG_aB2A-py3.9/lib/python3.9/site-packages/deltalake/table.py", line 436, in is_deltatable
return RawDeltaTable.is_deltatable(table_uri, storage_options)
TypeError: argument 'storage_options': 'bool' object cannot be converted to 'PyString'
2024-11-13 10:28:18,589|[WARNING]|66899|140167860366208|dlt|load.py|complete_jobs:430|Job for my_table.50fe02e280.reference retried in load 1731479294.7018855 with message argument 'storage_options': 'bool' object cannot be converted to 'PyString'
2024-11-13 10:28:18,590|[ERROR]|66899|140167369590464|dlt|reference.py|run_managed:431|Transient exception in job my_table.50fe02e280.reference in file /home/j/.dlt/pipelines/my_pipe/load/normalized/1731479294.7018855/started_jobs/my_table.50fe02e280.3.reference
Traceback (most recent call last):
File "/home/j/repos/dlt/dlt/common/destination/reference.py", line 422, in run_managed
self.run()
File "/home/j/repos/dlt/dlt/destinations/impl/filesystem/filesystem.py", line 143, in run
delta_table = self._delta_table()
File "/home/j/repos/dlt/dlt/destinations/impl/filesystem/filesystem.py", line 187, in _delta_table
if DeltaTable.is_deltatable(self.make_remote_url(), storage_options=self._storage_options):
File "/home/j/.cache/pypoetry/virtualenvs/dlt-2tG_aB2A-py3.9/lib/python3.9/site-packages/deltalake/table.py", line 436, in is_deltatable
return RawDeltaTable.is_deltatable(table_uri, storage_options)
TypeError: argument 'storage_options': 'bool' object cannot be converted to 'PyString'
2024-11-13 10:28:18,590|[WARNING]|66899|140167860366208|dlt|load.py|complete_jobs:430|Job for my_table.50fe02e280.reference retried in load 1731479294.7018855 with message argument 'storage_options': 'bool' object cannot be converted to 'PyString'
2024-11-13 10:28:18,591|[ERROR]|66899|140167369590464|dlt|reference.py|run_managed:431|Transient exception in job my_table.50fe02e280.reference in file /home/j/.dlt/pipelines/my_pipe/load/normalized/1731479294.7018855/started_jobs/my_table.50fe02e280.4.reference
Traceback (most recent call last):
File "/home/j/repos/dlt/dlt/common/destination/reference.py", line 422, in run_managed
self.run()
File "/home/j/repos/dlt/dlt/destinations/impl/filesystem/filesystem.py", line 143, in run
delta_table = self._delta_table()
File "/home/j/repos/dlt/dlt/destinations/impl/filesystem/filesystem.py", line 187, in _delta_table
if DeltaTable.is_deltatable(self.make_remote_url(), storage_options=self._storage_options):
File "/home/j/.cache/pypoetry/virtualenvs/dlt-2tG_aB2A-py3.9/lib/python3.9/site-packages/deltalake/table.py", line 436, in is_deltatable
return RawDeltaTable.is_deltatable(table_uri, storage_options)
TypeError: argument 'storage_options': 'bool' object cannot be converted to 'PyString'
2024-11-13 10:28:18,593|[WARNING]|66899|140167860366208|dlt|load.py|complete_jobs:430|Job for my_table.50fe02e280.reference retried in load 1731479294.7018855 with message argument 'storage_options': 'bool' object cannot be converted to 'PyString'
Traceback (most recent call last):
File "/home/j/repos/dlt/dlt/pipeline/pipeline.py", line 605, in load
runner.run_pool(load_step.config, load_step)
File "/home/j/repos/dlt/dlt/common/runners/pool_runner.py", line 91, in run_pool
while _run_func():
File "/home/j/repos/dlt/dlt/common/runners/pool_runner.py", line 84, in _run_func
run_metrics = run_f.run(cast(TExecutor, pool))
File "/home/j/repos/dlt/dlt/load/load.py", line 638, in run
self.load_single_package(load_id, schema)
File "/home/j/repos/dlt/dlt/load/load.py", line 597, in load_single_package
raise pending_exception
dlt.load.exceptions.LoadClientJobRetry: Job for my_table.50fe02e280.reference had 5 retries which a multiple of 5. Exiting retry loop. You can still rerun the load package to retry this job. Last failure message was argument 'storage_options': 'bool' object cannot be converted to 'PyString'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/j/repos/dlt/mre.py", line 16, in <module>
pipe.run([{"foo": 1}], table_name="my_table", table_format="delta")
File "/home/j/repos/dlt/dlt/pipeline/pipeline.py", line 223, in _wrap
step_info = f(self, *args, **kwargs)
File "/home/j/repos/dlt/dlt/pipeline/pipeline.py", line 272, in _wrap
return f(self, *args, **kwargs)
File "/home/j/repos/dlt/dlt/pipeline/pipeline.py", line 744, in run
return self.load(destination, dataset_name, credentials=credentials)
File "/home/j/repos/dlt/dlt/pipeline/pipeline.py", line 223, in _wrap
step_info = f(self, *args, **kwargs)
File "/home/j/repos/dlt/dlt/pipeline/pipeline.py", line 163, in _wrap
return f(self, *args, **kwargs)
File "/home/j/repos/dlt/dlt/pipeline/pipeline.py", line 272, in _wrap
return f(self, *args, **kwargs)
File "/home/j/repos/dlt/dlt/pipeline/pipeline.py", line 612, in load
raise PipelineStepFailed(
dlt.pipeline.exceptions.PipelineStepFailed: Pipeline execution failed at stage load when processing package 1731479294.7018855 with exception:
<class 'dlt.load.exceptions.LoadClientJobRetry'>
Job for my_table.50fe02e280.reference had 5 retries which a multiple of 5. Exiting retry loop. You can still rerun the load package to retry this job. Last failure message was argument 'storage_options': 'bool' object cannot be converted to 'PyString'
Operating system
Linux
Runtime environment
Local
Python version
3.9
dlt data source
No response
dlt destination
Filesystem & buckets
Other deployment details
No response
Additional information
I did not test AWS default credentials, but I assume they won't work either since I never wrote logic to handle them.
dlt version
1.3.1a1
Describe the problem
Using default Azure/AWS credentials leads to error when using
delta
table format onfilesystem
destination. Works fine when using e.g.parquet
instead.For GCP this has recently been fixed in https://github.com/dlt-hub/dlt/issues/1965.
Expected behavior
Default Azure/AWS credentials are handled properly and can be used to authenticate.
Steps to reproduce
For Azure:
Traceback:
Operating system
Linux
Runtime environment
Local
Python version
3.9
dlt data source
No response
dlt destination
Filesystem & buckets
Other deployment details
No response
Additional information
I did not test AWS default credentials, but I assume they won't work either since I never wrote logic to handle them.