Open FelixYBW opened 2 months ago
@JkSelf can you take a look?
@FelixYBW Sure. I will look at this issue later. Thanks.
Spark does not support map type as a grouping expression. The error should be org.apache.spark.sql.AnalysisException: expression x1.v1 cannot be used as a grouping expression because its data type map<string,int> is not an orderable data type. I don't know why the error message is different, what's your spark version? @FelixYBW
@FelixYBW @zml1206 It seems spark doesn't support group by map type. And I try both spark 3.2 and 3.4 and got the following exception:
expression type1.map cannot be used as a grouping expression because its data type map<string,string> is not an orderable data type.;
Aggregate [map#14], [byte#0, short#1, int#2, long#3L, float#4, double#5, decimal#6, string#7, binary#8, bool#9, date#10, timestamp#11, array#12, struct#13, map#14]
+- SubqueryAlias type1
+- View (`type1`, [byte#0,short#1,int#2,long#3L,float#4,double#5,decimal#6,string#7,binary#8,bool#9,date#10,timestamp#11,array#12,struct#13,map#14])
+- RelationV2[byte#0, short#1, int#2, long#3L, float#4, double#5, decimal#6, string#7, binary#8, bool#9, date#10, timestamp#11, array#12, struct#13, map#14] parquet file:/mnt/DP_disk3/jk/projects/gluten/backends-velox/target/scala-2.12/test-classes/data-type-validation-data/type1
org.apache.spark.sql.AnalysisException: expression type1.map cannot be used as a grouping expression because its data type map<string,string> is not an orderable data type.;
Aggregate [map#14], [byte#0, short#1, int#2, long#3L, float#4, double#5, decimal#6, string#7, binary#8, bool#9, date#10, timestamp#11, array#12, struct#13, map#14]
+- SubqueryAlias type1
+- View (`type1`, [byte#0,short#1,int#2,long#3L,float#4,double#5,decimal#6,string#7,binary#8,bool#9,date#10,timestamp#11,array#12,struct#13,map#14])
+- RelationV2[byte#0, short#1, int#2, long#3L, float#4, double#5, decimal#6, string#7, binary#8, bool#9, date#10, timestamp#11, array#12, struct#13, map#14] parquet file:/mnt/DP_disk3/jk/projects/gluten/backends-velox/target/scala-2.12/test-classes/data-type-validation-data/type1
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:52)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:51)
at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:182)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkValidGroupingExprs$1(CheckAnalysis.scala:328)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$15(CheckAnalysis.scala:340)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$15$adapted(CheckAnalysis.scala:340)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:340)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:97)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:97)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:92)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:182)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:205)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:202)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:75)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:183)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:183)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:75)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:73)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:65)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:98)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
at org.apache.gluten.execution.WholeStageTransformerSuite.$anonfun$compareResultsAgainstVanillaSpark$1(WholeStageTransformerSuite.scala:297)
at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
at org.apache.gluten.execution.WholeStageTransformerSuite.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(WholeStageTransformerSuite.scala:40)
It's odd the query passed here when I use vanilla spark.
It's odd the query passed here when I use vanilla spark.
Have you replaced the Analyzer? @FelixYBW
customer enhanced their spark to add the support.
Backend
VL (Velox)
Bug description
empty table: v1: map<string,int>, v2: map<string,int>, v3: map<string,struct<max:int,avg:int>> SQL:
Spark version
None
Spark configurations
No response
System information
No response
Relevant logs
No response