apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.15k stars 416 forks source link

[VL] Remaining issues for typed imperative aggregate #4763

Open liujiayi771 opened 7 months ago

liujiayi771 commented 7 months ago

Description

Exclude UTs:

  1. SPARK-31993: concat_ws in agg function with plenty of string/array types columns in GlutenStringFunctionsSuite Reason: If all input values are null, collect_list in vanilla Spark return an empty array, but array_agg in Velox return null.
felipepessoto commented 3 months ago

@liujiayi771 do you know if collect_set is not expected to work with complex types if the value is null? Example, this works with Spark, but doesn't work when Gluten is enabled:

import org.apache.spark.sql.functions._

val jsonStr = """{"txn":{"appId":"txnId","version":0,"lastUpdated":null}}"""
val jsonSchema = StructType(Seq(StructField("txn",
  StructType(Seq(StructField("appId",StringType,true),StructField("lastUpdated",LongType,true),StructField("version",LongType,true))),true
)))
val df = spark.read.schema(jsonSchema).json(Seq(jsonStr).toDS).select(collect_set(col("txn")))    
df.head

Error:

[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (c7f5 executor driver): org.apache.gluten.exception.GlutenException: java.lang.RuntimeException: Exception: VeloxUserError
[info] Error Source: USER
[info] Error Code: INVALID_ARGUMENT
[info] Reason: ROW comparison not supported for values that contain nulls
[info] Retriable: False
[info] Expression: !decoded.base()->containsNullAt(indices[index])
[info] Function: checkNestedNulls
[info] File: /__w/1/s/Velox/velox/functions/lib/CheckNestedNulls.cpp
[info] Line: 34
liujiayi771 commented 3 months ago

@felipepessoto This is a known issue, the Velox backend does not yet support it.