databricks / iceberg-kafka-connect

Apache License 2.0
219 stars 49 forks source link

Record projection Index out of bounds error #288

Closed ismailsimsek closed 2 months ago

ismailsimsek commented 2 months ago

Hi ,

Im getting Index 2 out of bounds error when writing deletes writer.deleteKey(keyProjection.wrap(row)); couldnt pinpoint the issue but looks like a bug in iceberg code?

It seems like. In the ParquetValueWriters for delete files writers.length has full record schema, instead of key schema.

then its failing to lookup non key fields. so its looping over all the fields, which are more than key fields, and then failing with index error.

https://github.com/apache/iceberg/blob/ab2c6f889d07eeee51a1f58605be248e9330d91b/parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueWriters.java#L578-L583

here is a test to reproduce it https://github.com/tabular-io/iceberg-kafka-connect/pull/287

cc @bryanck

ismailsimsek commented 2 months ago

Just found the issue! it was the issue when generating GenericAppenderFactory in which full table schema was given, instead of key schema

https://github.com/tabular-io/iceberg-kafka-connect/blob/595f835f5d9174e57660b12f407dabc84781e500/kafka-connect/src/test/java/io/tabular/iceberg/connect/data2/IcebergUtil.java#L96-L103