Open 0x26res opened 6 months ago
I have encountered the same problem.
I think this problem is caused by the shortcut in if_else kernel when left is invalid. https://github.com/apache/arrow/blob/apache-arrow-17.0.0/cpp/src/arrow/compute/kernels/scalar_if_else.cc#L740-L750
This shortcut is not safe when right is a chunk of other larger array. In this case, offset of right might starts from the middle of larger array. Because this shortcut copies value of right to newly allocated value of output, offset of output should start from zero, but just a copy of right is used.
Describe the bug, including details regarding any error messages, version, and platform.
I have a chunked array made of view/slices of the same array.
When I call if_else on that array, the results are wrong and it can results in strings that are not valid utf-8.
For context, I'm loading data from a parquet file, and replacing empty strings with nulls. This started happening when the size of the parquet file increased and data was chunked.
I've tested with pyarrow==16.0.0
Component(s)
C++, Python