Open comphead opened 3 months ago
I'll look into this @comphead.
Thanks @parthchandra the issue is likely in org.apache.comet.parquet.TypeUtil.checkParquetType
when deriving the decimal type
Update on this - Spark vectorized reader also throws the same error. Users have to turn off vectorized reading to read such files. It is also pretty near impossible to write a binary decimal field (as opposed to a fixed length byte-array field) using Spark. One has to use the Parquet writer or some other project (avro for example) to write such fields. In Comet there is in fact no implementation to decode a binary decimal field just like there is none in the Spark vectorized reader. It should be possible to implement, but I'm wondering if this is a niche case. @comphead
@comphead @parthchandra can we close this issue?
Well, the issue still exists, however its related to deprecated Parquet formats where Decimal is represented as BINARY. We probably should mention this in doc that such kind of conversions are not supported
Yes, let's close this. We can revisit this if more people report it.
Describe the bug
The user raised the issue when Comet crashes on
when reading the parquet file
The parquet file meta is
Spark without Comet reads the data with no issues
Steps to reproduce
No response
Expected behavior
Should read the value
Additional context
No response