apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.77k stars 3.29k forks source link

[enhance](nereids) add rewrite rule SplitJoinForNullSkew #44357

Open feiniaofeiafei opened 1 day ago

feiniaofeiafei commented 1 day ago

What problem does this PR solve?

LogicalJoin(type:left join, hashConjuncts:t1.a=t2.a) +--Plan1(output:t1.a) +--Plan2(output:t2.a) -> LogicalUnion +--LogicalFilter(t1.a is null) +--Plan1 +--LogicalJoin(type:left join, t1.a=t2.a) +--LogicalFilter(t1.a is not null) +--Plan1 +--Plan2

Since there is sometimes null value skew on the join key, which can lead to prolonged execution times, the join is split into two parts based on whether the join key is null. This can accelerate the query.

Release note

None

Check List (For Author)

Check List (For Reviewer who merge this PR)

doris-robot commented 1 day ago

Thank you for your contribution to Apache Doris. Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?
feiniaofeiafei commented 1 day ago

run buildall

feiniaofeiafei commented 1 day ago

run buildall

feiniaofeiafei commented 1 day ago

run buidall

feiniaofeiafei commented 1 day ago

run buildall

feiniaofeiafei commented 1 day ago

run p0

doris-robot commented 1 day ago
TPC-H: Total hot run time: 39963 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools Tpch sf100 test result on commit 1b40cc50be96015ccb1eb034d44273a10393bea4, data reload: false ------ Round 1 ---------------------------------- q1 17954 7517 7285 7285 q2 2046 173 170 170 q3 10794 1123 1210 1123 q4 10487 747 760 747 q5 7592 2756 2692 2692 q6 247 146 148 146 q7 975 622 609 609 q8 9263 1875 1960 1875 q9 6639 6471 6402 6402 q10 6954 2293 2328 2293 q11 452 265 261 261 q12 428 222 214 214 q13 17796 3062 3037 3037 q14 250 213 216 213 q15 581 525 523 523 q16 647 595 601 595 q17 987 614 586 586 q18 7403 6656 6760 6656 q19 1346 1007 935 935 q20 491 187 177 177 q21 3970 3269 3115 3115 q22 380 322 309 309 Total cold run time: 107682 ms Total hot run time: 39963 ms ----- Round 2, with runtime_filter_mode=off ----- q1 7295 7334 7268 7268 q2 329 225 231 225 q3 2909 2869 2967 2869 q4 2127 1900 1908 1900 q5 5677 5682 5755 5682 q6 222 140 142 140 q7 2240 1831 1843 1831 q8 3391 3563 3569 3563 q9 8741 8913 8977 8913 q10 3620 3566 3585 3566 q11 601 523 507 507 q12 860 623 606 606 q13 10727 3303 3204 3204 q14 321 286 265 265 q15 590 526 541 526 q16 703 638 652 638 q17 1857 1646 1593 1593 q18 8484 7881 7796 7796 q19 2247 1642 1590 1590 q20 2112 1878 1948 1878 q21 5638 5455 5443 5443 q22 650 562 570 562 Total cold run time: 71341 ms Total hot run time: 60565 ms ```
doris-robot commented 1 day ago
TPC-DS: Total hot run time: 198243 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools TPC-DS sf100 test result on commit 1b40cc50be96015ccb1eb034d44273a10393bea4, data reload: false query1 1253 952 910 910 query2 6239 2113 2119 2113 query3 10789 4028 4231 4028 query4 68158 29670 23621 23621 query5 5279 489 483 483 query6 445 184 200 184 query7 5871 313 295 295 query8 321 242 231 231 query9 9350 2683 2675 2675 query10 496 260 264 260 query11 17828 15780 15935 15780 query12 154 113 120 113 query13 1621 447 423 423 query14 11559 8163 8193 8163 query15 224 183 198 183 query16 7030 434 451 434 query17 1080 561 566 561 query18 1781 294 293 293 query19 195 173 156 156 query20 120 112 110 110 query21 205 102 110 102 query22 4625 4390 4273 4273 query23 34975 34188 34174 34174 query24 5319 2469 2511 2469 query25 522 388 391 388 query26 646 153 152 152 query27 1694 288 283 283 query28 4320 2421 2393 2393 query29 728 418 416 416 query30 205 147 147 147 query31 1028 813 831 813 query32 68 56 56 56 query33 408 280 298 280 query34 982 519 526 519 query35 859 765 725 725 query36 1145 962 981 962 query37 126 75 76 75 query38 4347 4289 4261 4261 query39 1487 1455 1437 1437 query40 207 101 100 100 query41 46 41 44 41 query42 112 101 100 100 query43 556 498 494 494 query44 1208 830 809 809 query45 188 167 167 167 query46 1197 735 709 709 query47 1910 1814 1828 1814 query48 420 324 326 324 query49 735 403 399 399 query50 872 389 418 389 query51 7353 7211 7113 7113 query52 100 87 91 87 query53 270 183 180 180 query54 544 402 406 402 query55 81 78 73 73 query56 265 243 247 243 query57 1307 1174 1165 1165 query58 231 220 224 220 query59 3171 3020 2925 2925 query60 292 258 258 258 query61 113 109 110 109 query62 824 683 701 683 query63 229 188 192 188 query64 1428 678 661 661 query65 3394 3298 3260 3260 query66 721 335 330 330 query67 15791 15614 15836 15614 query68 4259 594 566 566 query69 415 261 267 261 query70 1200 1154 1146 1146 query71 372 274 251 251 query72 6321 4125 4138 4125 query73 817 374 364 364 query74 10181 9028 9035 9028 query75 3545 2698 2699 2698 query76 1937 1132 1273 1132 query77 516 291 290 290 query78 10595 9572 9362 9362 query79 1626 598 602 598 query80 876 436 443 436 query81 531 241 238 238 query82 232 119 125 119 query83 266 158 162 158 query84 266 71 71 71 query85 970 309 330 309 query86 369 312 303 303 query87 4900 4573 4724 4573 query88 3859 2273 2236 2236 query89 439 302 303 302 query90 2169 196 192 192 query91 141 105 112 105 query92 62 49 55 49 query93 2256 542 549 542 query94 917 288 300 288 query95 342 252 254 252 query96 684 275 283 275 query97 2949 2659 2700 2659 query98 224 204 199 199 query99 1712 1323 1318 1318 Total cold run time: 322995 ms Total hot run time: 198243 ms ```
doris-robot commented 1 day ago
ClickBench: Total hot run time: 32.02 s ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools ClickBench test result on commit 1b40cc50be96015ccb1eb034d44273a10393bea4, data reload: false query1 0.03 0.03 0.02 query2 0.07 0.03 0.03 query3 0.24 0.08 0.07 query4 1.63 0.10 0.11 query5 0.43 0.41 0.41 query6 1.17 0.64 0.66 query7 0.02 0.01 0.01 query8 0.04 0.03 0.03 query9 0.57 0.50 0.50 query10 0.55 0.56 0.55 query11 0.14 0.10 0.10 query12 0.15 0.11 0.12 query13 0.61 0.59 0.60 query14 2.70 2.70 2.83 query15 0.91 0.83 0.81 query16 0.38 0.37 0.38 query17 1.07 1.00 1.04 query18 0.20 0.20 0.20 query19 1.99 1.89 2.06 query20 0.01 0.01 0.01 query21 15.36 0.56 0.59 query22 2.37 3.09 1.72 query23 17.25 0.90 0.77 query24 3.45 0.65 1.65 query25 0.28 0.05 0.10 query26 0.48 0.13 0.14 query27 0.05 0.04 0.04 query28 10.45 1.09 1.08 query29 12.56 3.23 3.22 query30 0.25 0.06 0.06 query31 2.87 0.37 0.36 query32 3.30 0.46 0.48 query33 2.97 3.12 3.03 query34 17.06 4.52 4.48 query35 4.53 4.50 4.50 query36 0.65 0.48 0.48 query37 0.10 0.06 0.06 query38 0.05 0.03 0.03 query39 0.03 0.02 0.02 query40 0.16 0.14 0.12 query41 0.08 0.02 0.02 query42 0.04 0.02 0.03 query43 0.04 0.03 0.02 Total cold run time: 107.29 s Total hot run time: 32.02 s ```