apache / iceberg-python

Apache PyIceberg
https://py.iceberg.apache.org/
Apache License 2.0
478 stars 176 forks source link

[discuss] `Transaction` API's `autocommit` #1253

Open kevinjqliu opened 3 weeks ago

kevinjqliu commented 3 weeks ago

Porting over from #1246

Can be a potential footgun (https://github.com/apache/iceberg-python/pull/1246#issuecomment-2439024843) Autocommit usage (https://github.com/apache/iceberg-python/pull/1246#issuecomment-2439780764)

However, there may still be some concerns around this since Transaction is a public class. If this is the case, I think we can start from making the parameter "private" (autocommit -> _autocommit) and/or adding some doc to explain the usage.

kevinjqliu commented 3 weeks ago

The idea is to make the code simpler if we only want to evolve schema/spec/... i.e.

with table.update_schema() as update:
update.add_column("some_field", IntegerType(), "doc")

instead of another with..transaction wrapper

with table.transaction() as transaction:
with transaction.update_schema() as update_schema:
update.add_column("some_other_field", IntegerType(), "doc")

....

What about moving the autocommit logic out of Transaction and into the class that uses it instead?

For example, UpdateSchema can implement __enter__ and __exit__ to commit the transaction automatically?

Currently, these classes use autocommit=True: