apache / iceberg-python

Apache PyIceberg
https://py.iceberg.apache.org/
Apache License 2.0
402 stars 147 forks source link

Removing then adding the same partition in update_spec fails #1131

Closed lkindere closed 2 weeks ago

lkindere commented 3 weeks ago

Apache Iceberg version

0.7.1 (latest release)

Please describe the bug 🐞

When adding a field to partition spec, if the field previously existed but was deleted it will fail when not explictly defining a column name:

For example this would fail: table.update_spec().add_identity("VendorID").commit() table.update_spec().remove_field("VendorID").commit() table.update_spec().add_identity("VendorID").commit()

This would fail: table.update_spec().add_field("VendorID", IdentityTransform()).commit() table.update_spec().remove_field("VendorID").commit() table.update_spec().add_field("VendorID", IdentityTransform()).commit()

This would not fail when defining the partition field name: table.update_spec().add_field("VendorID", IdentityTransform(), "VendorID").commit() table.update_spec().remove_field("VendorID").commit() table.update_spec().add_field("VendorID", IdentityTransform(), "VendorID").commit()

Example trace: Traceback (most recent call last): File "C:\Users\a\PycharmProjects\pythonProject\test.py", line 24, in table.update_spec().add_identity("VendorID").commit() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\a\PycharmProjects\pythonProject.venv\Lib\site-packages\pyiceberg\table__init.py", line 3682, in add_identity return self.add_field(source_column_name, IdentityTransform(), None) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\a\PycharmProjects\pythonProject.venv\Lib\site-packages\pyiceberg\table__init.py", line 3657, in add_field new_field = self._partition_field((bound_ref.field.field_id, transform), partition_field_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\a\PycharmProjects\pythonProject.venv\Lib\site-packages\pyiceberg\table\init.py", line 3833, in _partition_field return PartitionField(source_id, field_key[1], transform, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\a\PycharmProjects\pythonProject.venv\Lib\site-packages\pyiceberg\partitioning.py", line 111, in init super().init(**data) File "C:\Users\a\PycharmProjects\pythonProject.venv\Lib\site-packages\pydantic\main.py", line 193, in init self.pydantic_validator.validate_python(data, self_instance=self) pydantic_core._pydantic_core.ValidationError: 1 validation error for PartitionField name Field required [type=missing, input_value={'source-id': 1, 'field-i...m': IdentityTransform()}, input_type=dict] For further information visit https://errors.pydantic.dev/2.8/v/missing

sungwy commented 2 weeks ago

Hi @lkindere thank you very much for reporting this issue! I've put up a PR to fix this, and I've created tests based on your examples here: https://github.com/apache/iceberg-python/pull/1161