apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.19k stars 434 forks source link

[CH] UT failed due to exceeding memory, but is ok in the prevois #7726

Open baibaichen opened 2 days ago

baibaichen commented 2 days ago

Backend

VL (CH)

Bug description

Run test SQL in GlutenClickHouseTPCHColumnarShuffleParquetAQESuite

select LINEITEM.L_DISCOUNT,
       PART.P_TYPE,
       LINEITEM.L_COMMENT,
       LINEITEM.L_SUPPKEY,
       PART.P_PARTKEY,
       PART.P_SIZE,
       LINEITEM.L_RETURNFLAG,
       LINEITEM.L_RECEIPTDATE,
       PART.P_NAME,
       SUPPLIER.S_COMMENT,
       LINEITEM.L_ORDERKEY,
       PART.P_MFGR,
       SUPPLIER.S_ACCTBAL,
       SUPPLIER.S_SUPPKEY,
       LINEITEM.L_SHIPMODE,
       SUPPLIER.S_NATIONKEY,
       LINEITEM.L_SHIPDATE,
       LINEITEM.L_COMMITDATE,
       SUPPLIER.S_NAME,
       PART.P_COMMENT,
       LINEITEM.L_TAX,
       LINEITEM.L_QUANTITY,
       LINEITEM.L_PARTKEY,
       PART.P_CONTAINER,
       MIN(LINEITEM.L_EXTENDEDPRICE),
       COUNT(LINEITEM.L_QUANTITY),
       COUNT(DISTINCT LINEITEM.L_PARTKEY),
       MIN(LINEITEM.L_TAX),
       MIN(ORDERS.O_TOTALPRICE),
       COUNT(LINEITEM.L_EXTENDEDPRICE),
       COUNT(ORDERS.O_SHIPPRIORITY),
       COUNT(1),
       MAX(LINEITEM.L_DISCOUNT)
from LINEITEM
         INNER JOIN SUPPLIER AS SUPPLIER ON LINEITEM.L_SUPPKEY = SUPPLIER.S_SUPPKEY
         INNER JOIN PART AS PART ON LINEITEM.L_PARTKEY = PART.P_PARTKEY
         INNER JOIN ORDERS AS ORDERS ON LINEITEM.L_ORDERKEY = ORDERS.O_ORDERKEY
where (not (((P_RETAILPRICE is not null or
              ((S_NATIONKEY is not null and P_MFGR like '%Manufacturer#1') or P_BRAND not like 'Brand#11')) or
             ((S_SUPPKEY not in
               (1206, 1496, 1191, 2445, 491, 1407, 1969, 261, 1418, 310, 2099, 1343, 327, 261, 707, 37, 753, 696, 1363,
                628, 1158, 2239, 26, 1180, 2448, 1698, 2099, 1326, 1247, 1203, 161, 1698, 310, 692, 491, 1920, 28, 370,
                370, 261, 2258, 1146, 983, 683, 24, 1611, 5161, 3141, 2258, 1287, 683, 1720, 1887, 310, 707, 1836, 1287,
                2065, 1859, 1203, 1611, 1835, 2099, 701, 2314, 692, 1418, 2367, 425, 1720, 8285, 1969, 1804, 310, 2258,
                1418, 463, 2048, 368, 1253, 549, 2258, 327, 1973, 817) and 1300 > S_SUPPKEY) or (S_PHONE not in
                                                                                                 ('10-246-381-9259',
                                                                                                  '10-211-466-9198',
                                                                                                  '10-509-209-3829',
                                                                                                  '10-741-929-4244',
                                                                                                  '10-393-500-3856',
                                                                                                  '10-495-104-1252',
                                                                                                  '10-983-665-2259',
                                                                                                  '10-295-590-8708',
                                                                                                  '10-983-665-2259',
                                                                                                  '10-745-572-7198',
                                                                                                  '10-384-209-1825',
                                                                                                  '10-734-420-5738',
                                                                                                  '10-845-970-4551',
                                                                                                  '10-630-928-4130',
                                                                                                  '10-325-193-7475',
                                                                                                  '%10-475-868-5521',
                                                                                                  '10-903-990-3612',
                                                                                                  '10-352-443-2162%',
                                                                                                  '10-842-403-7954',
                                                                                                  '10-789-325-3069',
                                                                                                  '10-996-906-4890',
                                                                                                  '10-404-519-2270',
                                                                                                  '10-848-716-8078',
                                                                                                  '10-246-381-9259',
                                                                                                  '10-262-377-2302',
                                                                                                  '10-361-729-1693',
                                                                                                  '10-745-572-7198',
                                                                                                  '10-384-209-1825',
                                                                                                  '10-262-132-6639',
                                                                                                  '10-361-729-1693',
                                                                                                  '10-746-144-5600',
                                                                                                  '10-409-763-8909',
                                                                                                  '10-123-465-1292',
                                                                                                  '10-745-572-7198%',
                                                                                                  '10-599-740-9848',
                                                                                                  '10-453-843-1585',
                                                                                                  '10-191-563-6127',
                                                                                                  '10-848-716-8078',
                                                                                                  '10-763-945-1271',
                                                                                                  '10-393-500-3856') and
                                                                                                 (not (P_NAME not like 'light dark lemon lace medium%' and P_NAME is null))))) or
            ((((S_ADDRESS is null or P_CONTAINER in
                                     ('LG JAR', 'JUMBO CASE', 'JUMBO CASE', 'MED BOX', 'WRAP BAG', 'SM CASE',
                                      'WRAP JAR', 'JUMBO PKG', 'SM CAN', 'SM BOX', 'JUMBO CASE', 'MED BOX', 'LG JAR',
                                      'JUMBO CASE', 'MED DRUM', 'JUMBO PKG', 'SM CAN', 'WRAP JAR', 'LG CASE', 'LG BAG',
                                      'SM PACK', 'JUMBO DRUM', 'WRAP BOX', 'JUMBO CAN', 'LG PKG', 'WRAP CAN',
                                      'MED PACK', 'SM BOX', 'SM DRUM', 'SM PACK', 'MED DRUM', 'MED PACK', 'MED BOX',
                                      'MED CAN%', 'SM JAR', 'SM CAN', 'JUMBO BOX', 'JUMBO BAG', 'LG BAG', 'LG PKG',
                                      'LG PACK', 'LG BAG', 'JUMBO BOX', 'SM BOX', 'JUMBO CAN', 'JUMBO PKG', 'LG BAG',
                                      'MED BOX', 'JUMBO CASE', 'MED BOX', 'LG BAG', 'LG PACK', 'MED BOX', 'LG PKG',
                                      'SM BOX', 'WRAP BOX', 'LG CASE', 'MED PACK', 'LG PKG', '%LG CASE', 'LG JAR',
                                      'LG BAG', 'LG BOX', 'SM CAN', 'WRAP CAN', 'WRAP PACK', 'JUMBO CASE', 'SM BOX',
                                      'SM PACK', 'WRAP PKG', 'MED CAN', 'SM BOX', 'LG CASE', 'JUMBO CAN', 'LG JAR',
                                      'SM DRUM', 'MED PKG', 'JUMBO BAG', 'SM CASE', 'MED BAG', 'SM PACK',
                                      'SM PACK')) and S_SUPPKEY is not null) and (P_PARTKEY not in
                                                                                  (1358682, 1592117, 1114403, 839396,
                                                                                   1114617, 959268, 1114713, 1358631,
                                                                                   806397, 959018, 1114926, 812800,
                                                                                   1568237, 959088, 839340, 959419,
                                                                                   1115053, 1358740, 1114282) and
                                                                                  (S_SUPPKEY between 463 and 1887 or
                                                                                   S_SUPPKEY not in
                                                                                   (1287, 1422, 1878, 1191, 1804, 476,
                                                                                    1097, 1326, 1597, 1158, 261, 1689,
                                                                                    1493, 2314, 817, 1097, 2239, 327,
                                                                                    1887, 118, 1547, 476, 2131, 1247,
                                                                                    1496, 1698, 1717, 454, 1692, 1920,
                                                                                    1973, 2010, 1804, 774, 1611, 425,
                                                                                    28, 1611, 183, 983, 800, 5915, 1311,
                                                                                    24, 2298, 118, 183, 784, 1592, 1549,
                                                                                    983, 1283, 1418, 291, 118, 1407,
                                                                                    2072, 291, 1180, 1404, 1097, 1724,
                                                                                    1611, 692, 491, 316, 161, 2314,
                                                                                    1404, 696, 2072, 2072, 491, 1692,
                                                                                    764, 742, 118, 425)))) and
             (P_CONTAINER in
              ('SM PKG', 'LG PKG', 'LG CASE', 'MED PKG', 'WRAP JAR', 'LG BAG', 'SM BOX', 'JUMBO BOX', 'SM PKG',
               'SM PKG', 'JUMBO BOX', 'MED BOX', 'JUMBO PKG', 'WRAP CAN', 'MED DRUM', 'MED JAR', 'SM BAG', 'MED CAN',
               'SM PACK', 'SM CASE', 'MED BAG', 'JUMBO PKG', 'LG CASE', 'SM PKG', 'MED BOX', 'LG CASE', 'JUMBO DRUM',
               'MED BAG', 'JUMBO CASE', 'SM BOX', 'JUMBO PACK', 'WRAP BOX', '%JUMBO BOX', 'JUMBO BOX', 'JUMBO CASE',
               'SM CAN', 'JUMBO BOX', 'SM CAN', 'LG CASE') and P_BRAND is null))))
   or (((P_BRAND is not null or (P_SIZE not in
                                 (25, 11, 48, 15, 48, 16, 3, 45, 37, 42, 47, 42, 42, 16, 97, 16, 48, 12, 87, 13, 27, 22,
                                  42, 37, 50, 9, 34) and S_NATIONKEY >= 0)) or 955.65 = S_ACCTBAL) or
       (P_TYPE not like 'MEDIUM POLISHED STEEL%' or (not S_ACCTBAL is null)))
group by LINEITEM.L_DISCOUNT, PART.P_TYPE, LINEITEM.L_COMMENT, LINEITEM.L_SUPPKEY, PART.P_PARTKEY, PART.P_SIZE,
         LINEITEM.L_RETURNFLAG, LINEITEM.L_RECEIPTDATE, PART.P_NAME, SUPPLIER.S_COMMENT, LINEITEM.L_ORDERKEY,
         PART.P_MFGR, SUPPLIER.S_ACCTBAL, SUPPLIER.S_SUPPKEY, LINEITEM.L_SHIPMODE, SUPPLIER.S_NATIONKEY,
         LINEITEM.L_SHIPDATE, LINEITEM.L_COMMITDATE, SUPPLIER.S_NAME, PART.P_COMMENT, LINEITEM.L_TAX,
         LINEITEM.L_QUANTITY, LINEITEM.L_PARTKEY, PART.P_CONTAINER

Get the following error, it is ok in the previous codes!

org.apache.gluten.exception.GlutenException: org.apache.gluten.exception.GlutenException: Memory limit exceeded: would use 512.29 MiB (attempt to allocate chunk of 4224948 bytes), current RSS 1.21 GiB, maximum: 512.00 MiB.: While executing GraceAggregatingTransform
0. ..../SourceCode/rebase_ch/contrib/llvm-project/libcxx/include/exception:141: Poco::Exception::Exception(String const&, int) @ 0x0000000014561359
1. ..../SourceCode/rebase_ch/src/Common/Exception.cpp:109: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c2aa3d9
2. ..../SourceCode/rebase_ch/src/Common/Exception.h:110: DB::Exception::Exception(PreformattedMessage&&, int) @ 0x0000000006790a2c
3. ..../SourceCode/rebase_ch/src/Common/Exception.h:128: DB::Exception::Exception<char const*, char const*, String, long&, String, String, char const*, std::basic_string_view<char, std::char_traits<char>>>(int, FormatStringHelperImpl<std::type_identity<char const*>::type, std::type_identity<char const*>::type, std::type_identity<String>::type, std::type_identity<long&>::type, std::type_identity<String>::type, std::type_identity<String>::type, std::type_identity<char const*>::type, std::type_identity<std::basic_string_view<char, std::char_traits<char>>>::type>, char const*&&, char const*&&, String&&, long&, String&&, String&&, char const*&&, std::basic_string_view<char, std::char_traits<char>>&&) @ 0x000000000c2d10c9
4. ..../SourceCode/rebase_ch/src/Common/MemoryTracker.cpp:316: MemoryTracker::allocImpl(long, bool, MemoryTracker*, double) @ 0x000000000c2d00d2
5. ..../SourceCode/rebase_ch/src/Common/MemoryTracker.cpp:373: MemoryTracker::allocImpl(long, bool, MemoryTracker*, double) @ 0x000000000c2cfc96
6. ..../SourceCode/rebase_ch/src/Common/CurrentMemoryTracker.cpp:64: CurrentMemoryTracker::allocImpl(long, bool) @ 0x000000000c27899e
7. ..../SourceCode/rebase_ch/src/Common/Allocator.cpp:233: Allocator<false, false>::realloc(void*, unsigned long, unsigned long, unsigned long) @ 0x000000000c2772db
8. void DB::PODArrayBase<8ul, 4096ul, Allocator<false, false>, 63ul, 64ul>::realloc<>(unsigned long) @ 0x0000000006884976
9. DB::ColumnString::insertFrom(DB::IColumn const&, unsigned long) @ 0x000000000f01fecd
10. ..../SourceCode/rebase_ch/src/Columns/ColumnNullable.cpp:273: DB::ColumnNullable::insertFrom(DB::IColumn const&, unsigned long) @ 0x0000000010c1bd1f
11. ..../SourceCode/rebase_ch/src/Columns/IColumn.cpp:149: DB::IColumnHelper<DB::ColumnNullable, DB::IColumn>::scatter(unsigned long, DB::PODArray<unsigned long, 4096ul, Allocator<false, false>, 63ul, 64ul> const&) const @ 0x0000000010e80b79
12. ..../SourceCode/rebase_ch/src/Interpreters/JoinUtils.cpp:569: DB::JoinCommon::scatterBlockByHash(std::vector<String, std::allocator<String>> const&, DB::Block const&, unsigned long) @ 0x0000000010369286
13. ..../SourceCode/rebase_ch/utils/extern-local-engine/Operator/GraceAggregatingTransform.cpp:268: local_engine::GraceAggregatingTransform::scatterBlock(DB::Block const&) @ 0x000000000c805f63
14. ..../SourceCode/rebase_ch/utils/extern-local-engine/Operator/GraceAggregatingTransform.cpp:495: local_engine::GraceAggregatingTransform::mergeOneBlock(DB::Block const&, bool) @ 0x000000000c80285f
15. ..../SourceCode/rebase_ch/utils/extern-local-engine/Operator/GraceAggregatingTransform.cpp:383: local_engine::GraceAggregatingTransform::prepareBucketOutputBlocks(unsigned long) @ 0x000000000c803453
16. ..../SourceCode/rebase_ch/utils/extern-local-engine/Operator/GraceAggregatingTransform.cpp:158: local_engine::GraceAggregatingTransform::work() @ 0x000000000c8021c8
17. ..../SourceCode/rebase_ch/src/Processors/Executors/ExecutionThreadContext.cpp:47: DB::ExecutionThreadContext::executeTask() @ 0x00000000118b16a2
18. ..../SourceCode/rebase_ch/src/Processors/Executors/PipelineExecutor.cpp:289: DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x00000000118a695f
19. ..../SourceCode/rebase_ch/src/Processors/Executors/PipelineExecutor.cpp:163: DB::PipelineExecutor::executeStep(std::atomic<bool>*) @ 0x00000000118a63c9
20. ..../SourceCode/rebase_ch/src/Processors/Executors/PullingPipelineExecutor.cpp:54: DB::PullingPipelineExecutor::pull(DB::Chunk&) @ 0x00000000118b8134
21. ..../SourceCode/rebase_ch/src/Processors/Executors/PullingPipelineExecutor.cpp:65: DB::PullingPipelineExecutor::pull(DB::Block&) @ 0x00000000118b8299
22. ..../SourceCode/rebase_ch/utils/extern-local-engine/Parser/LocalExecutor.cpp:68: local_engine::LocalExecutor::hasNext() @ 0x000000000c672db1
23. ..../SourceCode/rebase_ch/utils/extern-local-engine/local_engine_jni.cpp:308: Java_org_apache_gluten_vectorized_BatchIterator_nativeHasNext @ 0x0000000006776237

  at org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:41)
  at org.apache.gluten.backendsapi.clickhouse.CollectMetricIterator.hasNext(CHIteratorApi.scala:349)
  at org.apache.gluten.vectorized.CloseableCHColumnBatchIterator.$anonfun$hasNext$1(CloseableCHColumnBatchIterator.scala:42)
  at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
  at org.apache.gluten.metrics.GlutenTimeMetric$.withNanoTime(GlutenTimeMetric.scala:41)
  at org.apache.gluten.vectorized.CloseableCHColumnBatchIterator.hasNext(CloseableCHColumnBatchIterator.scala:42)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
  at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
  at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
  at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1931)
  at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1274)
  at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1274)
  at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2268)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
  at org.apache.spark.scheduler.Task.run(Task.scala:136)

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

lgbo-ustc commented 2 days ago

Set spark.gluten.sql.columnar.backend.ch.runtime_config.max_allowed_memory_usage_ratio_for_aggregate_merging to a smaller value, 0.5.