Integrating Comet with Iceberg internally gets the following error if there are deleted rows in the Iceberg table:
org.apache.comet.CometNativeException: Invalid argument error: all columns in a record batch must have the specified row count
It is because Iceberg stores row mappings in its CometVector implementations and uses it to skip deleted rows during iterating rows in a batch. The row values in arrays are not actually deleted. The Iceberg batch reader sets the number of rows of a returned record batch to be the "logical" number of rows after deletion. It is okay for Java Arrow.
However, arrow-rs has stricter check on the lengths of arrays and row number parameter. Once it detects they are inconsistent, an error like above will be thrown.
Describe the bug
Integrating Comet with Iceberg internally gets the following error if there are deleted rows in the Iceberg table:
It is because Iceberg stores row mappings in its
CometVector
implementations and uses it to skip deleted rows during iterating rows in a batch. The row values in arrays are not actually deleted. The Iceberg batch reader sets the number of rows of a returned record batch to be the "logical" number of rows after deletion. It is okay for Java Arrow.However, arrow-rs has stricter check on the lengths of arrays and row number parameter. Once it detects they are inconsistent, an error like above will be thrown.
Steps to reproduce
No response
Expected behavior
No response
Additional context
No response