Describe the bug, including details regarding any error messages, version, and platform.
LargeBinary and LargeString use int64 offsets, however Binary and String types use int32 offsets, this makes them susceptible to slice index out of bounds errors when the column/array is larger than ~2GB ~= 2^31 bytes.
To reproduce try deserializing a parquet file that is greater than 2.2 GB.
A workaround is to force the go library to deserialize the field/column as LargeBinary instead of Binary:
Describe the bug, including details regarding any error messages, version, and platform.
LargeBinary and LargeString use int64 offsets, however Binary and String types use int32 offsets, this makes them susceptible to slice index out of bounds errors when the column/array is larger than ~2GB ~= 2^31 bytes.
To reproduce try deserializing a parquet file that is greater than 2.2 GB.
A workaround is to force the go library to deserialize the field/column as LargeBinary instead of Binary:
store_schema
https://arrow.apache.org/docs/cpp/parquet.html#roundtripping-arrow-types-and-schemaError looks like:
version and platform
Component(s)
Go