Closed bwjoh closed 1 month ago
Thanks for reporting the bug! Is it possible to provide a file that can reproduce this issue?
cc @gszadovszky this issue seems to be caused by a recent refactoring commit.
Thanks, @bwjoh. It seems I've overlooked how this part worked. The code is not super clear, unfortunately. Also, seems we are lacking a unit test for this scenario. Would you like to contribute a fix for this one?
Describe the bug, including details regarding any error messages, version, and platform.
Noticed when upgrading from 1.13.1 to 1.14.1
This appears to be due to PARQUET-2431 - https://github.com/apache/parquet-java/pull/1274/files#diff-362b7d44b24283c1bb1f6ca3e124cb72706a33ed96d86b58bf3339f20aafb4e9R732
Looking into how my code hit this and it seems to be that
CorruptDeltaByteArrays.requiresSequentialReads
was essentially doing thedataColumn instanceof RequiresPreviousReader
check previously (CorruptDeltaByteArrays.requiresSequentialReads
can only return true whenencoding == Encoding.DELTA_BYTE_ARRAY
, andorg.apache.parquet.column.values.RequiresPreviousReader
is only implemented by *DeltaByteArrayReader classes).With no check on
previousReader instanceof RequiresPreviousReader
the ClassCastException is possible above.This is more likely to happen when using
org.apache.parquet.io.ColumnIOFactory#ColumnIOFactory()
to read files withoutcreatedBy
. In my case I was able to fix this by adding createdBy, knowing that all Parquet files I have were written after PARQUET-246, which preventsCorruptDeltaByteArrays.requiresSequentialReads
from returning trueComponent(s)
No response