NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
https://nvidia.github.io/spark-rapids
Apache License 2.0
823 stars 236 forks source link

[BUG] [Spark 4] Exceptions from `DateTimeException`s do not match Spark exceptions with ANSI enabled #11641

Open rwlee opened 1 month ago

rwlee commented 1 month ago

Description: With ANSI enabled, when reading invalid date in EXCEPTION mode, the exception string from Spark does not match the exception from Spark.

Noticed in csv_test.py::test_read_valid_and_invalid_dates

On Spark pre 4.0: DateTimeException On Spark 4.0+:

'pyspark.errors.exceptions.captured.SparkUpgradeException: [INCONSISTENT_BEHAVIOR_CROSS_VERSION.PARSE_DATETIME_BY_NEW_PARSER] You may get a different result due to the upgrading to Spark >= 3.0:
E       Fail to parse '2020-50-16' in the new parser.
E       You can set "spark.sql.legacy.timeParserPolicy" to "LEGACY" to restore the behavior before Spark 3.0, or set to "CORRECTED" and treat it as an invalid datetime string. SQLSTATE: 42K0B'

On the Spark RAPIDS: DateTimeException: One or more values is not a valid date

Repro: Workaround for the test failure is in flight, once that fix is in -- run the test_read_valid_and_invalid_dates test on Spark 4.0 with EXCEPTION mode enabled.

Expected behavior The overflow exception should match what is produced from Spark 4.

Misc: Similar to #11556 + #11552 + #11550 -- exception names and types not aligning in ansi mode Spark 4.0+