apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.49k stars 2.24k forks source link

Spark: remove ROW_POSITION from project schema #11610

Open huaxingao opened 1 day ago

huaxingao commented 1 day ago

Originally, we have ReadConfig#generateOffsetToStartPos(Schema schema) to compute the row offsets of the row groups. This method needs to check if the schema contains ROW_POSITION. https://github.com/apache/iceberg/pull/11520 uses native getRowIndexOffset so we don't need generateOffsetToStartPos any more. As a result, we don't need to add ROW_POSITION to the schema any more.

huaxingao commented 14 hours ago

cc @flyrain @szehon-ho

huaxingao commented 13 hours ago

@flyrain Yes, we need the same change in the older version too. Just added the changes.