apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.49k stars 2.24k forks source link

Spark: Remove extra columns for ColumnBatch #11551

Open huaxingao opened 1 week ago

huaxingao commented 1 week ago

In Equality Delete, we build ColumnarBatchReader for the equality delete filter columns to read their values and determine which rows are deleted. If these filter columns are not among the requested columns, they are considered extra and should be removed before returning the ColumnBatch to Spark.

Suppose the table schema includes C1, C2, C3, C4, C5. If the query is: SELECT C5 FROM table, and the equality delete filter is on C3 and C4,

We read the values of C3 and C4 to identify which rows are deleted. However, we do not want to include these values in the ColumnBatch that we return to Spark.

huaxingao commented 6 days ago

cc @flyrain @szehon-ho @viirya