apache / iceberg-python

Apache PyIceberg
https://py.iceberg.apache.org/
Apache License 2.0
309 stars 114 forks source link

Optional Schema Check for `add_files` #869

Open syun64 opened 4 days ago

syun64 commented 4 days ago

Feature Request / Improvement

Many folks have been reaching out about the usage of add_files, which is a sign that there's a market for users who would prefer not to rewrite parquet files and just want to side load them into an Iceberg table.

Although it is noted to be an expert user feature in the documentation (as is the case with the rest of migration procedures Iceberg has had) the ease of use of the feature seems to be appealing to users of all levels.

Therefore, I think it would be great to introduce optional guardrails for the API.

Introducing an optional schema check would be an easy first step.

Fokko commented 2 days ago

@syun64 What do you think of enabling the schema validation, and having the ability to turn it off? Adding a file to a table might potentially brick a table if I understand the implication correctly.

syun64 commented 2 days ago

Yes, that was what I was thinking as well.