Open namtab-ma-i opened 3 weeks ago
External DuckDB code and error:
profiles.yml
outputs:
dev:
type: duckdb
attach:
- path: abfs://<my_container>/dev.duckdb
alias: snapshot_db
extensions:
- azure
- parquet
filesystems:
- fs: abfs
anon: false
account_name: "{{ env_var('ADLS_STORAGE_ACCOUNT') }}"
account_key: "{{ env_var('ADLS_STORAGE_ACCOUNT_KEY') }}"
This results in
Encountered an error:
Runtime Error
NotImplementedError: File mode not supported
At:
/IdeaProjects/ray-inference/.venv/lib/python3.9/site-packages/adlfs/spec.py(1957): _ _ init_ _
/IdeaProjects/ray-inference/.venv/lib/python3.9/site-packages/adlfs/spec.py(1833): _open
/IdeaProjects/ray-inference/.venv/lib/python3.9/site-packages/fsspec/spec.py(1298): open
Yeah snapshots require table mutations (i.e., UPDATE
statements), which aren't supported for external materializations. Blob-store based .duckdb
files are supported, but alas, they are read-only (the TL;DR here is that both of these features require the ability to do random writes in DuckDB, which is the only thing blob stores are not designed to do.)
If you can copy the .duckdb file to a local FS, you can do all of these things to it and then write the resulting .duckdb file back out to blob storage once you're done, which is the best option I have for you at the moment.
I use in-memory DuckDB and my sources are all located in Azure Blob Storage. I set models to external materialize to the same storage and it works without issues. However, it seems that I can't use an external materialize option for snapshots.
Is there any way to do this? Or is there a way to set the .duckdb path to be in the blob storage?