apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.77k stars 3.28k forks source link

[fix](cloud) serialize cache init to avoid unstable cache pick #44429

Open freemandealer opened 8 hours ago

freemandealer commented 8 hours ago

The original paralleled cache init will causing unstable pick of cache base path because the choice depends on the order of init which could be different after each BE reboot. Thus, cause cache missing and duplicate cache block across multiple caches (disk space waste).

This commit will serialize the init process of multiple cache and using fixed order, i.e. the order explicitly declared in be conf: file_cache_path.

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

Check List (For Reviewer who merge this PR)

doris-robot commented 8 hours ago

Thank you for your contribution to Apache Doris. Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?
freemandealer commented 8 hours ago

run buildall

github-actions[bot] commented 8 hours ago

clang-tidy review says "All clean, LGTM! :+1:"

doris-robot commented 8 hours ago
TPC-H: Total hot run time: 40232 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools Tpch sf100 test result on commit 396010b250b74855ae7fd4ece1894cf0ca8c084a, data reload: false ------ Round 1 ---------------------------------- q1 17577 7484 7317 7317 q2 2040 183 169 169 q3 10593 1071 1210 1071 q4 10569 744 675 675 q5 7605 2776 2727 2727 q6 245 151 147 147 q7 1002 625 599 599 q8 9235 1843 1981 1843 q9 6499 6465 6403 6403 q10 6972 2310 2342 2310 q11 460 262 276 262 q12 431 224 218 218 q13 17777 3061 3062 3061 q14 250 232 214 214 q15 568 534 527 527 q16 640 573 588 573 q17 978 591 599 591 q18 7351 6880 6702 6702 q19 1341 1021 1000 1000 q20 499 186 191 186 q21 4181 3384 3320 3320 q22 391 320 317 317 Total cold run time: 107204 ms Total hot run time: 40232 ms ----- Round 2, with runtime_filter_mode=off ----- q1 7829 7304 7253 7253 q2 328 230 232 230 q3 3007 2985 3024 2985 q4 2181 1875 1849 1849 q5 5690 5719 5775 5719 q6 226 146 145 145 q7 2320 1854 1870 1854 q8 3439 3571 3581 3571 q9 8853 8962 8947 8947 q10 3657 3612 3586 3586 q11 618 521 519 519 q12 829 626 606 606 q13 11779 3290 3238 3238 q14 306 273 264 264 q15 575 535 527 527 q16 680 667 652 652 q17 1910 1661 1660 1660 q18 8501 7796 7710 7710 q19 1712 1532 1686 1532 q20 2160 1892 1871 1871 q21 5757 5578 5467 5467 q22 658 607 587 587 Total cold run time: 73015 ms Total hot run time: 60772 ms ```
doris-robot commented 8 hours ago

TeamCity be ut coverage result: Function Coverage: 38.04% (9903/26033) Line Coverage: 29.23% (82871/283529) Region Coverage: 28.35% (42549/150086) Branch Coverage: 24.91% (21573/86592) Coverage Report: http://coverage.selectdb-in.cc/coverage/396010b250b74855ae7fd4ece1894cf0ca8c084a_396010b250b74855ae7fd4ece1894cf0ca8c084a/report/index.html

doris-robot commented 7 hours ago
TPC-DS: Total hot run time: 197177 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools TPC-DS sf100 test result on commit 396010b250b74855ae7fd4ece1894cf0ca8c084a, data reload: false query1 1244 955 912 912 query2 6249 2090 2073 2073 query3 10811 4040 4012 4012 query4 67800 29116 23582 23582 query5 4886 486 465 465 query6 415 181 173 173 query7 5549 302 289 289 query8 306 222 216 216 query9 8787 2679 2680 2679 query10 429 245 245 245 query11 17160 15124 15993 15124 query12 162 114 105 105 query13 1490 453 442 442 query14 10013 6980 6930 6930 query15 224 190 178 178 query16 7100 471 476 471 query17 1408 573 567 567 query18 1862 314 304 304 query19 202 156 156 156 query20 121 112 124 112 query21 203 107 100 100 query22 4924 4734 4511 4511 query23 34486 34493 34507 34493 query24 5906 2528 2561 2528 query25 481 385 409 385 query26 686 145 146 145 query27 2287 283 287 283 query28 4670 2495 2472 2472 query29 678 438 420 420 query30 226 155 144 144 query31 989 818 849 818 query32 68 55 57 55 query33 434 278 289 278 query34 935 522 524 522 query35 864 731 716 716 query36 1104 966 984 966 query37 118 76 73 73 query38 4620 4510 4409 4409 query39 1511 1476 1449 1449 query40 203 97 103 97 query41 45 45 44 44 query42 108 100 103 100 query43 554 509 505 505 query44 1227 850 857 850 query45 188 170 202 170 query46 1145 694 723 694 query47 2048 1921 1960 1921 query48 431 321 341 321 query49 738 392 387 387 query50 857 386 394 386 query51 7453 7201 7095 7095 query52 102 88 89 88 query53 248 176 189 176 query54 514 410 390 390 query55 79 78 76 76 query56 262 234 251 234 query57 1282 1189 1186 1186 query58 239 235 221 221 query59 3322 3216 3036 3036 query60 284 295 256 256 query61 134 132 131 131 query62 793 679 677 677 query63 219 195 203 195 query64 1487 736 630 630 query65 3281 3231 3254 3231 query66 712 307 309 307 query67 16375 15705 15847 15705 query68 3893 585 588 585 query69 433 252 253 252 query70 1153 1133 1166 1133 query71 359 244 247 244 query72 6476 4114 4011 4011 query73 761 366 364 364 query74 10374 9132 8962 8962 query75 3432 2801 2691 2691 query76 1796 1135 1147 1135 query77 565 294 280 280 query78 10559 9515 9412 9412 query79 1636 595 609 595 query80 924 435 430 430 query81 509 227 272 227 query82 1269 117 118 117 query83 278 149 151 149 query84 279 75 69 69 query85 911 312 305 305 query86 340 309 307 307 query87 4781 4631 4799 4631 query88 3768 2265 2220 2220 query89 421 289 287 287 query90 2039 185 184 184 query91 138 105 103 103 query92 65 47 54 47 query93 1911 542 550 542 query94 861 294 280 280 query95 341 247 247 247 query96 625 278 276 276 query97 2880 2682 2737 2682 query98 220 198 201 198 query99 1615 1310 1301 1301 Total cold run time: 321350 ms Total hot run time: 197177 ms ```
doris-robot commented 7 hours ago
ClickBench: Total hot run time: 32.1 s ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools ClickBench test result on commit 396010b250b74855ae7fd4ece1894cf0ca8c084a, data reload: false query1 0.03 0.03 0.02 query2 0.07 0.03 0.04 query3 0.23 0.07 0.06 query4 1.63 0.10 0.10 query5 0.41 0.42 0.42 query6 1.15 0.65 0.65 query7 0.02 0.02 0.02 query8 0.04 0.03 0.02 query9 0.58 0.49 0.51 query10 0.56 0.56 0.53 query11 0.14 0.11 0.10 query12 0.14 0.11 0.11 query13 0.62 0.60 0.61 query14 2.70 2.70 2.77 query15 0.90 0.83 0.83 query16 0.39 0.39 0.39 query17 1.07 1.02 1.05 query18 0.22 0.22 0.20 query19 1.89 1.75 1.88 query20 0.01 0.01 0.01 query21 15.40 0.58 0.59 query22 2.62 2.35 1.76 query23 17.18 0.83 0.92 query24 3.22 0.62 1.47 query25 0.30 0.24 0.04 query26 0.34 0.13 0.13 query27 0.04 0.05 0.05 query28 10.75 1.09 1.07 query29 12.56 3.30 3.25 query30 0.25 0.07 0.06 query31 2.94 0.39 0.38 query32 3.67 0.48 0.47 query33 3.02 3.09 3.17 query34 16.81 4.45 4.50 query35 4.51 4.48 4.50 query36 0.68 0.49 0.48 query37 0.09 0.06 0.06 query38 0.05 0.04 0.03 query39 0.03 0.02 0.03 query40 0.17 0.13 0.13 query41 0.07 0.02 0.02 query42 0.04 0.02 0.02 query43 0.03 0.04 0.03 Total cold run time: 107.57 s Total hot run time: 32.1 s ```