apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.2k stars 434 forks source link

[CH] regexp_extract work with strings with a null value,throw Index value 1 is out of range, should be in [0, 1) #4774

Open ditgittube opened 8 months ago

ditgittube commented 8 months ago

Backend

CH (ClickHouse)

Bug description

create table regex_test (id int, name string) using parquet; insert into regex_test values(null, null); select regexp_extract(name, 'a', 1) from regex_test;

spark return

+--------------------------+
|regexp_extract(name, a, 1)|
+--------------------------+
|null                      |
+--------------------------+

but ch throw exception:

java.lang.RuntimeException: Index value 1 is out of range, should be in [0, 1)
0. ./cmake-build-relwithdebinfo-llvm15/./contrib/llvm-project/libcxx/include/exception:134: Poco::Exception::Exception(String const&, int) @ 0x13edf03a in /usr/local/clickhouse/lib/libch.so
1. ./cmake-build-relwithdebinfo-llvm15/./src/Common/Exception.cpp:91: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0xbb4f615 in /usr/local/clickhouse/lib/libch.so
2. DB::Exception::Exception<long&, unsigned int>(int, FormatStringHelperImpl<std::type_identity<long&>::type, std::type_identity<unsigned int>::type>, long&, unsigned int&&) @ 0xaf1dd37 in /usr/local/clickhouse/lib/libch.so
3. DB::(anonymous namespace)::FunctionRegexpExtract::executeImpl(std::vector<DB::ColumnWithTypeAndName, std::allocator<DB::ColumnWithTypeAndName>> const&, std::shared_ptr<DB::IDataType const> const&, unsigned long) const @ 0xaf1d2c1 in /usr/local/clickhouse/lib/libch.so
4. DB::IFunction::executeImplDryRun(std::vector<DB::ColumnWithTypeAndName, std::allocator<DB::ColumnWithTypeAndName>> const&, std::shared_ptr<DB::IDataType const> const&, unsigned long) const @ 0x6c4a32a in /usr/local/clickhouse/lib/libch.so
5. DB::FunctionToExecutableFunctionAdaptor::executeDryRunImpl(std::vector<DB::ColumnWithTypeAndName, std::allocator<DB::ColumnWithTypeAndName>> const&, std::shared_ptr<DB::IDataType const> const&, unsigned long) const @ 0x6c49e4e in /usr/local/clickhouse/lib/libch.so
6. ./cmake-build-relwithdebinfo-llvm15/./src/Functions/IFunction.cpp:0: DB::IExecutableFunction::executeWithoutLowCardinalityColumns(std::vector<DB::ColumnWithTypeAndName, std::allocator<DB::ColumnWithTypeAndName>> const&, std::shared_ptr<DB::IDataType const> const&, unsigned long, bool) const @ 0xf47c391 in /usr/local/clickhouse/lib/libch.so
7. ./cmake-build-relwithdebinfo-llvm15/./src/Functions/IFunction.cpp:0: DB::IExecutableFunction::defaultImplementationForNulls(std::vector<DB::ColumnWithTypeAndName, std::allocator<DB::ColumnWithTypeAndName>> const&, std::shared_ptr<DB::IDataType const> const&, unsigned long, bool) const @ 0xf47c6aa in /usr/local/clickhouse/lib/libch.so
8. ./cmake-build-relwithdebinfo-llvm15/./src/Functions/IFunction.cpp:0: DB::IExecutableFunction::executeWithoutLowCardinalityColumns(std::vector<DB::ColumnWithTypeAndName, std::allocator<DB::ColumnWithTypeAndName>> const&, std::shared_ptr<DB::IDataType const> const&, unsigned long, bool) const @ 0xf47c35d in /usr/local/clickhouse/lib/libch.so
9. ./cmake-build-relwithdebinfo-llvm15/./contrib/boost/boost/smart_ptr/intrusive_ptr.hpp:115: DB::IExecutableFunction::executeWithoutSparseColumns(std::vector<DB::ColumnWithTypeAndName, std::allocator<DB::ColumnWithTypeAndName>> const&, std::shared_ptr<DB::IDataType const> const&, unsigned long, bool) const @ 0xf47cd12 in /usr/local/clickhouse/lib/libch.so
10. ./cmake-build-relwithdebinfo-llvm15/./src/Functions/IFunction.cpp:384: DB::IExecutableFunction::execute(std::vector<DB::ColumnWithTypeAndName, std::allocator<DB::ColumnWithTypeAndName>> const&, std::shared_ptr<DB::IDataType const> const&, unsigned long, bool) const @ 0xf47dd9b in /usr/local/clickhouse/lib/libch.so
11. ./cmake-build-relwithdebinfo-llvm15/./src/Interpreters/ActionsDAG.cpp:0: DB::ActionsDAG::updateHeader(DB::Block) const @ 0xf97ff69 in /usr/local/clickhouse/lib/libch.so
12. ./cmake-build-relwithdebinfo-llvm15/./src/Processors/Transforms/ExpressionTransform.cpp:0: DB::ExpressionTransform::transformHeader(DB::Block, DB::ActionsDAG const&) @ 0x112d51a5 in /usr/local/clickhouse/lib/libch.so
13. ./cmake-build-relwithdebinfo-llvm15/./src/Processors/QueryPlan/ExpressionStep.cpp:32: DB::ExpressionStep::ExpressionStep(DB::DataStream const&, std::shared_ptr<DB::ActionsDAG> const&) @ 0x11408474 in /usr/local/clickhouse/lib/libch.so
14. ./cmake-build-relwithdebinfo-llvm15/./contrib/llvm-project/libcxx/include/__memory/unique_ptr.h:0: local_engine::ProjectRelParser::parseProject(std::unique_ptr<DB::QueryPlan, std::default_delete<DB::QueryPlan>>, substrait::Rel const&, std::list<substrait::Rel const*, std::allocator<substrait::Rel const*>>&) @ 0xbcb3fbf in /usr/local/clickhouse/lib/libch.so
15. ./cmake-build-relwithdebinfo-llvm15/./utils/extern-local-engine/Parser/ProjectRelParser.cpp:0: local_engine::ProjectRelParser::parse(std::unique_ptr<DB::QueryPlan, std::default_delete<DB::QueryPlan>>, substrait::Rel const&, std::list<substrait::Rel const*, std::allocator<substrait::Rel const*>>&) @ 0xbcb3dd9 in /usr/local/clickhouse/lib/libch.so
16. ./cmake-build-relwithdebinfo-llvm15/./contrib/llvm-project/libcxx/include/__memory/unique_ptr.h:303: local_engine::SerializedPlanParser::parseOp(substrait::Rel const&, std::list<substrait::Rel const*, std::allocator<substrait::Rel const*>>&) @ 0xbc530dd in /usr/local/clickhouse/lib/libch.so
17. ./cmake-build-relwithdebinfo-llvm15/./utils/extern-local-engine/Parser/SerializedPlanParser.cpp:0: local_engine::SerializedPlanParser::parse(std::unique_ptr<substrait::Plan, std::default_delete<substrait::Plan>>) @ 0xbc51bf6 in /usr/local/clickhouse/lib/libch.so
18. ./cmake-build-relwithdebinfo-llvm15/./utils/extern-local-engine/Parser/SerializedPlanParser.cpp:2027: local_engine::SerializedPlanParser::parse(String const&) @ 0xbc639cd in /usr/local/clickhouse/lib/libch.so
19. ./cmake-build-relwithdebinfo-llvm15/./utils/extern-local-engine/local_engine_jni.cpp:327: Java_io_glutenproject_vectorized_ExpressionEvaluatorJniWrapper_nativeCreateKernelWithIterator @ 0x6bb8cad in /usr/local/clickhouse/lib/libch.so

    at io.glutenproject.vectorized.ExpressionEvaluatorJniWrapper.nativeCreateKernelWithIterator(Native Method)
    at io.glutenproject.vectorized.CHNativeExpressionEvaluator.createKernelWithBatchIterator(CHNativeExpressionEvaluator.java:97)
    at io.glutenproject.backendsapi.clickhouse.CHIteratorApi.genFirstStageIterator(CHIteratorApi.scala:131)
    at io.glutenproject.execution.GlutenWholeStageColumnarRDD.compute(GlutenWholeStageColumnarRDD.scala:139)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
    at org.apache.spark.sql.execution.CHColumnarToRowRDD.compute(CHColumnarToRowExec.scala:98)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:136)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:568)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1647)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:571)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Spark version

Spark-3.3.x

Spark configurations

No response

System information

No response

Relevant logs

No response

ditgittube commented 7 months ago

@liuneng1994 please help me look up this issue