Background
We aim to support backend and server-less write support for iceberg tables. We'd like to do that in similar way we do it to delta-tables: make table_formaticeberg to be recognized by the filesystem destination. From the user PoV this means:
writing and reading iceberg tables without query engine as a separate backend
maintaining and evolving the schema without catalog as a separate backend
We want to use pyiceberg. This limits the write disposition to append and replace (until upsert is implemented). We also wont' support vacuum, compact or z-order ops on the tables.
Tasks
[ ] we maintain a "technical" catalog: sqllite file per table. those files we store together with the data
[ ] to write a table we lock the sqllite file with TransactionalFile, pull it locally, use with pyiceberg and then write it back.
[ ] use pyiceberg to append, replace tables, create partitions, do schema evolution etc.
[ ] support all buckets via fsspec
[ ] like for delta, expose pyiceberg for a given table. read only (catalog without lock) and r/w with lock on catalog (maybe via context manager). this will allow people ie. to delete or rebuild partitions on a table.
[ ] support filesystemsql_client to create views on ICEBERG via duckdb
perhaps we can use an in-memory SQLite database instead of persisting the file to disk
if I understand correctly, at its core the catalog is only mapping table name to table metadata (which lives on the filesystem)—we can populate the in-memory SQLite database with this mapping based on dlt metadata
Background We aim to support backend and server-less write support for iceberg tables. We'd like to do that in similar way we do it to delta-tables: make
table_format
iceberg
to be recognized by the filesystem destination. From the user PoV this means:We want to use pyiceberg. This limits the write disposition to append and replace (until upsert is implemented). We also wont' support vacuum, compact or z-order ops on the tables.
Tasks
TransactionalFile
, pull it locally, use with pyiceberg and then write it back.pyiceberg
to append, replace tables, create partitions, do schema evolution etc.filesystem
sql_client
to create views on ICEBERG via duckdb