apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.19k stars 434 forks source link

[CH] Failed bind reference exception in `select distinct .... with cube` query #7759

Open KevinyhZou opened 1 day ago

KevinyhZou commented 1 day ago

Backend

CH (ClickHouse)

Bug description

test table schema: (id bigint, name string, day string)

query sql

select distinct day, name from(
select '2024-10-29' day
,coalesce(name,'all') name
,cnt
from
(
select count(distinct id) as cnt,if(upper(name) regexp '^[A-Z]{2}$',name,'unknow') name
from test_tbl3
group by name
with cube
)) limit 10;

exception message

Caused by: java.lang.UnsupportedOperationException: Failed to bind reference for name#10: Couldn't find name#10 in [name#5]
    at org.apache.gluten.expression.ExpressionConverter$.replaceWithExpressionTransformer0(ExpressionConverter.scala:223)
    at org.apache.gluten.expression.ExpressionConverter$.$anonfun$replaceWithExpressionTransformer0$22(ExpressionConverter.scala:714)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
    at scala.collection.Iterator.foreach(Iterator.scala:943)
    at scala.collection.Iterator.foreach$(Iterator.scala:943)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    at scala.collection.TraversableLike.map(TraversableLike.scala:286)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
    at scala.collection.AbstractTraversable.map(Traversable.scala:108)
    at org.apache.gluten.expression.ExpressionConverter$.replaceWithExpressionTransformer0(ExpressionConverter.scala:714)
    at org.apache.gluten.expression.ExpressionConverter$.$anonfun$replaceWithExpressionTransformer0$22(ExpressionConverter.scala:714)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)

Spark version

Spark-3.3.x

Spark configurations

No response

System information

No response

Relevant logs

No response

KevinyhZou commented 13 hours ago

The plan:

CollectLimit 10
+- CHNativeColumnarToRow
   +- ^(16) ProjectExecTransformer [2024-10-29 AS day#260, coalesce(name#273, all)#275 AS name#261]
      +- ^(16) HashAggregateTransformer(keys=[2024-10-29#274, coalesce(name#273, all)#275], functions=[], isStreamingAgg=false)
         +- ^(16) InputIteratorTransformer[2024-10-29#274, coalesce(name#273, all)#275]
            +- ColumnarExchange hashpartitioning(2024-10-29#274, coalesce(name#273, all)#275, 1), ENSURE_REQUIREMENTS, [plan_id=1362], [shuffle_writer_type=hash], [OUTPUT] List(2024-10-29:StringType, coalesce(name#273, all):StringType)
               +- ^(15) HashAggregateTransformer(keys=[2024-10-29#274, coalesce(name#273, all)#275], functions=[], isStreamingAgg=false)
                  +- ^(15) ExpandExecTransformer [[name#268, name#268, 2024-10-29#274, coalesce(name#273, all)#275], [null, name#268, 2024-10-29#274, coalesce(name#273, all)#275]], [name#273, name#268, 2024-10-29#274, coalesce(name#273, all)#275]
                     +- ^(15) !ProjectExecTransformer [name#268, 2024-10-29 AS 2024-10-29#274, coalesce(name#273, all) AS coalesce(name#273, all)#275]
                        +- ^(15) NativeFileScan parquet default.test_tbl3[name#268] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[hdfs://testcluster/user/hive/warehouse/test_tbl3], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<name:string>

the plan seems not right on expand transformer, [name#268, name#268, 2024-10-29#274, coalesce(name#273, all)#275]

lgbo-ustc commented 12 hours ago

Check rule PushdownAggregatePreProjectionAheadExpand