Open pxLi opened 4 days ago
also cc @res-life to help, this case is still unstable in non-utc environment
scala> spark.conf.set("spark.sql.session.timeZone", "Africa/Casablanca")
scala> spark.conf.set("spark.sql.legacy.timeParserPolicy", "CORRECTED")
scala> spark.sql("select unix_timestamp('42481005', 'yyyyMMdd')").show()
+----------------------------------+
|unix_timestamp(42481005, yyyyMMdd)|
+----------------------------------+
| 71910716400|
+----------------------------------+
scala>
scala> spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
scala> spark.sql("select unix_timestamp('42481005', 'yyyyMMdd')").show()
+----------------------------------+
|unix_timestamp(42481005, yyyyMMdd)|
+----------------------------------+
| 71910720000|
+----------------------------------+
GPU kernel is consistent with the CORRECTED
mode, does not fully support LEGACY
mode.
For LEGACY
mode, need to implement:
Spark code for LEGACY mode: Link
It uses SimpleDateFormat
:
class LegacySimpleTimestampFormatter(
pattern: String,
zoneId: ZoneId,
locale: Locale,
lenient: Boolean = true) extends TimestampFormatter {
@transient private lazy val sdf = {
val formatter = new SimpleDateFormat(pattern, locale)
formatter.setTimeZone(TimeZone.getTimeZone(zoneId))
formatter.setLenient(lenient)
formatter
}
override def parse(s: String): Long = {
fromJavaTimestamp(new Timestamp(sdf.parse(s).getTime))
}
Disable this case for branch 24.10 when TZ is not UTC
or Asia/Shanghai
Update document to clarify that not all non-DST(daylight saving time) time zones are supported, only tested Asia/Shanghai
timezone.
Let's use this issue to track the support for Africa/Casablanca
time zone in legacy mode.
thanks! retargeted issue to 24.12
We do not want to implement a kernel just for Africa/Casablanca unless we implement it for all time zones!!!
https://github.com/NVIDIA/spark-rapids/issues/6839
We picked Africa/Casablanca as an arbitrary time zone. It is one of the more complicated ones in terms of rules, but it is not there because a customer needs it. If we are going to develop a solution to a problem like this we want to develop a general purpose solution.
The issue https://github.com/NVIDIA/spark-rapids/issues/6839 is for non-LEGACY mode. This requirement is for LEGACY mode. So this is another requirement.
For legacy mode, from the code
It invokes LegacySimpleTimestampFormatter
and gets a java.sql.Timestamp
, then invokes fromJavaTimestamp
and rebaseJulianToGregorianMicros
. So this it will be another kernel.
Spark has different behaviors between LEGACY
mode and non-LEGACY
mode.
I think the priority for this issue is low.
Describe the bug
this is failing non-utc test in 24.10, first seen in rapids_it-non-utc-pre_release, run:123
cpu = 231388876800, gpu = 231388873200
Steps/Code to reproduce bug Please provide a list of steps or a code sample to reproduce the issue. Avoid posting private or sensitive data.
Expected behavior Pass or ignore the case
Environment details (please complete the following information)
Additional context Add any other context about the problem here.