NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
https://nvidia.github.io/spark-rapids
Apache License 2.0
792 stars 230 forks source link

[BUG] [Spark 4] Decimal casting errors raised from the plugin do not match those from Spark 4.0 in ANSI mode #11550

Open mythrocks opened 4 days ago

mythrocks commented 4 days ago

Description On Spark 4.0, when ANSI mode is enabled, and a DECIMAL(3,0) column is cast to a lower width type (e.g. DECIMAL(1,0)), the plugin's error message does not match the one from Apache Spark.

On Spark:

org.apache.spark.SparkArithmeticException: [NUMERIC_VALUE_OUT_OF_RANGE.WITH_SUGGESTION]  48 cannot be represented as Decimal(1, 0). If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error, and return NULL instead. SQLSTATE: 22003
== SQL (line 1, position 2) ==
 cast(a as decimal(1,0))
 ^^^^^^^^^^^^^^^^^^^^^^^

On the Spark RAPIDS plugin:

org.apache.spark.SparkArithmeticException: [ARITHMETIC_OVERFLOW] overflow occurred. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22003

Repro

Here is a minimal pytest repro:

@pytest.mark.parametrize('data_gen', [
    DecimalGen(3, 0)], ids=meta_idfn('from:'))
@pytest.mark.parametrize('to_type', [
    DecimalType(1, -1)], ids=meta_idfn('to:'))
def test_ansi_cast_failures_decimal_to_decimal(data_gen, to_type):
    assert_gpu_and_cpu_error(
        lambda spark : unary_op_df(spark, data_gen).select(f.col('a').cast(to_type), f.col('a')).collect(),
        conf=ansi_enabled_conf,
        error_message="cannot be represented as Decimal")

Expected behavior The overflow exception should match what is produced from Spark 4.

Misc Depends on #11414.

mythrocks commented 4 days ago

The "correct" solution here would be to shim the code that generates the exception, ideally in RapidsErrorUtils.

The problem is that RapidsErrorUtils underwent refactor, as part of #11414. That change has yet to be merged. Attempting to fix this simultaneously will lead to rework from conflicts.

I'm not inclined to fix this as part of addressing #11009. I will include this repro as part of #11009, with an xfail.