Open ZENOTME opened 2 weeks ago
Exactly. Internally you want to stack the changes together. For example, within a single transaction, you add a new field and then write the data, then the latest schema should be taken into account. We do this in PyIceberg here: https://github.com/apache/iceberg-python/blob/03a0d65ac05d556d0815e61a016effc2b8993702/pyiceberg/table/__init__.py#L715
We can implement the update_table_metadata based on https://github.com/apache/iceberg-rust/pull/587. cc @liurenjie1024 @Xuanwo @Fokko @c-thiel
Hi, @ZENOTME Could you elaborate on this? I'm kind of confusing about the proposal.
Hi, @ZENOTME Could you elaborate on this? I'm kind of confusing about the proposal.
For now, transaction can't reflect the update in time so we can stack them together. e.g.
// table is a v1 table
let tx = Transaction(table);
// This will end up sending two UpgradeFormatVersion into catalog
tx.upgrade_table_version().unwrap().
.upgrade_table_version().unwrap().commit()
But In pyiceberg, above behaviour will only send one UpgradeFormatVersion and the second one will see that the metadata of table has been updated. The update will be apply into local medata and reflect the change first.
We have check to avoid such duplicated case. For metastore tables, it's supposed to apply transaction actions in local, and update metastore pointer. For rest catalog, it should be sent to rest catalog server.
We have check to avoid such duplicated case. For metastore tables, it's supposed to apply transaction actions in local, and update metastore pointer. For rest catalog, it should be sent to rest catalog server.
Why for rest catalog, it should be sent to rest catalog server.🤔 According to API from pyiceberg, it seems possible to create a transaction without auto commit , which means that we also can apply transaction actions in local for rest catalog(do I miss something here)
_Originally posted by @Fokko in https://github.com/apache/iceberg-rust/pull/349#discussion_r1579449633_
Transaction should be able reflect the update in time. According to pyiceberg, we can provide a apply interface to update the table metedata.