NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
https://nvidia.github.io/spark-rapids
Apache License 2.0
811 stars 234 forks source link

[BUG] GPU plans are sometimes not generated unnecessarily because of a non-UTC timezone #11373

Open jihoonson opened 2 months ago

jihoonson commented 2 months ago

Describe the bug

The planner sometimes fails to generate a GPU plan because the timezone being set to a non-UTC. However, this does not seem always necessary.

Steps/Code to reproduce bug

Seq(("a", "a"),("b", "b")).toDF("k", "v").createOrReplaceTempView("tmp")
spark.sql("""select k, first(v) col2 from tmp group by k;""").show()

...

24/08/21 11:49:01 WARN GpuOverrides:
!Exec <SortAggregateExec> cannot run on GPU because not all expressions can be replaced
  @Expression <AttributeReference> k#7 could run on GPU
  @Expression <AggregateExpression> first(v#8, false) could run on GPU
    @Expression <First> first(v#8)() could run on GPU
      @Expression <AttributeReference> v#8 could run on GPU
  @Expression <AttributeReference> first(v#8)()#91 could run on GPU
  @Expression <Alias> toprettystring(k#7, Some(America/Los_Angeles)) AS toprettystring(k)#97 could run on GPU
    !Expression <ToPrettyString> toprettystring(k#7, Some(America/Los_Angeles)) cannot run on GPU because class org.apache.spark.sql.catalyst.expressions.ToPrettyString is not supported with timezone settings: (JVM: America/Los_Angeles, session: America/Los_Angeles). Set both of the timezones to UTC to enable class org.apache.spark.sql.catalyst.expressions.ToPrettyString support
      @Expression <AttributeReference> k#7 could run on GPU
  @Expression <Alias> toprettystring(first(v#8)()#91, Some(America/Los_Angeles)) AS toprettystring(col2)#98 could run on GPU
    !Expression <ToPrettyString> toprettystring(first(v#8)()#91, Some(America/Los_Angeles)) cannot run on GPU because class org.apache.spark.sql.catalyst.expressions.ToPrettyString is not supported with timezone settings: (JVM: America/Los_Angeles, session: America/Los_Angeles). Set both of the timezones to UTC to enable class org.apache.spark.sql.catalyst.expressions.ToPrettyString support
      @Expression <AttributeReference> first(v#8)()#91 could run on GPU

The query above fails to produce a GPU plan because of the expression toprettystring(first(v#8)()#91, Some(America/Los_Angeles)).

Expected behavior The query should be able to run on GPU when it should not be impacted by the timezone setting.

sameerz commented 2 months ago

This impacts customers not running in UTC.