apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.23k stars 2.17k forks source link

Incorrect schema used when using time-travel #11162

Open fides-bot opened 1 week ago

fides-bot commented 1 week ago

Apache Iceberg version

1.5.0

Query engine

Spark

Please describe the bug 🐞

When using time travel to retrieve a previous version of a table via a snapshot ID, the table’s schema is used instead of the snapshot's schema, contrary to the documentation.

Reproduction code:

# Create the table
spark_session.sql(f"CREATE TABLE iceberg_test (id bigint, data string, col float)")

# Populate the table
spark_session.sql(f"INSERT INTO iceberg_test values (1, 'a', 1.0), (2, 'b', 2.0), (3, 'c', 3.0)")

# Rename 'col' to 'value'
spark_session.sql(f"ALTER TABLE iceberg_test RENAME COLUMN col TO value")

# Insert a new row
spark_session.sql(f"INSERT INTO iceberg_test values (4, 'd', 4.0)")

# Time-travel to the first snapshot_id provided by iceberg_test.snapshots
snapshot_1 = spark_session.sql(f"SELECT * FROM iceberg_test VERSION AS OF <INSERT SNAPSHOT ID>")

# Operation on the renamed field
snapshot_1.filter("col == 2.0").show()

We end up with the following error:

Py4JJavaError: An error occurred while calling o111.showString.
: org.apache.iceberg.exceptions.ValidationException: Cannot find field 'col' in struct: struct<1: id: optional long, 2: data: optional string, 3: value: optional float>

NOTES:

Willingness to contribute

jishangarg commented 1 week ago

Hi @fides-bot, can I know which version of Spark you are using?

fides-bot commented 1 week ago

Hi @jishangarg, we're using Spark 3.5.1