apache / iceberg-python

Apache PyIceberg
https://py.iceberg.apache.org/
Apache License 2.0
479 stars 176 forks source link

[bug] Cannot perform table scan on V1 table #1194

Open kevinjqliu opened 1 month ago

kevinjqliu commented 1 month ago

Apache Iceberg version

main (development)

Please describe the bug 🐞

While working with a V1 table, I noticed a few bugs which prevent table scan on V1 table.

  1. Reading the manifest list defaults to V2 https://github.com/apache/iceberg-python/blob/620ad9f64307193ec0d26846b48f4e063b5da904/pyiceberg/manifest.py#L635

  2. Accessing fields not available in V1. https://github.com/apache/iceberg-python/blob/620ad9f64307193ec0d26846b48f4e063b5da904/pyiceberg/table/__init__.py#L1311 The content field is not available in V1, according to the spec. There are multiple places where something like this occurs.

Add a test to verify table scan on a V1 table

Fokko commented 3 weeks ago

@kevinjqliu Thanks for raising this. Can you elaborate on what you encountered when reading a V1 table? The Iceberg metadata is forward compatible, meaning we can turn any V1 table into a V2 (or V3) without issues.

The content field you mention will always be DATA in V1 (since there are no delete files). This can be solved easily with initial-default values. We do this in other places, such as sequence numbers.

It would be great to get a test that uncovers the issue so we can get this fixed :)