Closed cxzl25 closed 2 months ago
BINARY type promotion can be supported in Spark 4.0.0, and the test using 4.0.0 SNAPSHOT can work in #1909
[SPARK-40876][SQL] Widening type promotions in Parquet readers
Should we use INT32 and INT64 for decimals where applicable?
Should we use INT32 and INT64 for decimals where applicable?
Yes, Spark does this by default. It provides an option spark.sql.parquet.writeLegacyFormat=true
to achieve alignment with Hive writing decimal method.
writeLegacyParquetFormat match {
// Standard mode, 1 <= precision <= 9, writes as INT32
case false if precision <= Decimal.MAX_INT_DIGITS => int32Writer
// Standard mode, 10 <= precision <= 18, writes as INT64
case false if precision <= Decimal.MAX_LONG_DIGITS => int64Writer
// Legacy mode, 1 <= precision <= 18, writes as FIXED_LEN_BYTE_ARRAY
case true if precision <= Decimal.MAX_LONG_DIGITS => binaryWriterUsingUnscaledLong
// Either standard or legacy mode, 19 <= precision <= 38, writes as FIXED_LEN_BYTE_ARRAY
case _ => binaryWriterUsingUnscaledBytes
Could you please check if an email sent from private@orc.apache.org is accidently moved to the spam folder? @cxzl25
Could you please check if an email sent from private@orc.apache.org is accidently moved to the spam folder
Wow, I did miss this email, thank you @wgtmac so much for inviting me, and thank you @dongjoon-hyun so much to for commenting and merging multiple times, and to the entire ORC community!
Sorry for the long delay. Thank you, @cxzl25 and @wgtmac . :)
Merged to main/2.0.
What changes were proposed in this pull request?
This PR aims to write parquet decimal type data in Benchmark using
FIXED_LEN_BYTE_ARRAY
type.Why are the changes needed?
Because the decimal type of the parquet file generated now corresponds to the binary type of parquet, but Spark3.5.1 does not support reading. Spark 3.5.1 supports reading if using the
FIXED_LEN_BYTE_ARRAY
type.main
PR
How was this patch tested?
local test
Was this patch authored or co-authored using generative AI tooling?
No