NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
https://nvidia.github.io/spark-rapids
Apache License 2.0
822 stars 235 forks source link

Widen type promotion for decimals with larger scale in Parquet Read [databricks] #11727

Closed nartal1 closed 1 day ago

nartal1 commented 6 days ago

This PR contributes to https://github.com/NVIDIA/spark-rapids/issues/11433 and contributes to https://github.com/NVIDIA/spark-rapids/issues/11512

This PR supports additional type promotion to decimals with larger precision and scale. As long as the precision increases by at least as much as the scale, the decimal values can be promoted without loss of precision. A similar change is added in Apache Spark-4.0 version - https://github.com/apache/spark/pull/44513 Currently, the code throws an Exception if the scale of read schema is not as same as the schema that was written for all versions previous to Spark-4.0 on CPU. This fix is available for all versions in spark-rapids.

We have removed separate checks for the decimal if they can be read as int, long and byte_array and consolidated into one function canReadAsDecimal. Added integration test to verify that the conditions of the type promotions are met.

nartal1 commented 6 days ago

build

nartal1 commented 4 days ago

build

nartal1 commented 2 days ago

@revans2 @mythrocks I have addressed the review comments. Reverted the formatting changes. PTAL.

nartal1 commented 2 days ago

build

nartal1 commented 2 days ago

build