NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
https://nvidia.github.io/spark-rapids
Apache License 2.0
822 stars 235 forks source link

repartition-based fallback for hash aggregate v3 #11712

Open binmahone opened 1 week ago

binmahone commented 1 week ago

This PR replaces https://github.com/NVIDIA/spark-rapids/pull/11116, since there has been too many differences with #11116.

abellina commented 1 week ago

I will review this today

binmahone commented 1 week ago

Hi @revans2 and @abellina, this PR, as I stated in Slack and marked in itself, is still in DRAFT, so I didn't expect a thorough review on this before I change it from Draft to Ready. (Or we wee already have some special usage of Draft PR in our dev process?) Some major problems such as mixing other features(e.g. so called voluntary release check), not working implementation of skipping first iteration agg, etc. The refinement is not ready last Friday but it is now. Please go ahead and review.

The reason why I showed you this PR is for proving "That we do a complete pass over all of the batches and cache them into GPU memory before we make a decision on when and how to release a batch" is no longer true. I assume having access to the state-of-art implementation of agg in our customer may help you better understand the problem we're facing now.

binmahone commented 1 week ago

Next steps for this PR will be:

  1. Addressing review comments. When reviewing please aware that we found repartition-based agg is more prone to "Container OOM (then get killed by K8S)" than sort based. I'm troubleshooting on my side, but any insight from the reviewer would be a great help.
  2. We have selected four representative queries at our customer, and before/after result is being collected. Part of the collected results has already shown improvements.
  3. I'll run regression test on NDS to ensure no/minor regression.
binmahone commented 2 days ago

NDS regression test result on Spark2a (default 3K dataset):

Regression alerts
-----------------

Speedup results
-----------------
query1: Previous (1809.3333333333333 ms) vs Current (1895.0 ms) Diff -85 E2E 0.95x
query2: Previous (1640.6666666666667 ms) vs Current (2042.0 ms) Diff -401 E2E 0.80x
query3: Previous (565.3333333333334 ms) vs Current (567.0 ms) Diff -1 E2E 1.00x
query4: Previous (11023.333333333334 ms) vs Current (11092.666666666666 ms) Diff -69 E2E 0.99x
query5: Previous (2253.6666666666665 ms) vs Current (2584.3333333333335 ms) Diff -330 E2E 0.87x
query6: Previous (778.3333333333334 ms) vs Current (821.6666666666666 ms) Diff -43 E2E 0.95x
query7: Previous (3349.0 ms) vs Current (3399.6666666666665 ms) Diff -50 E2E 0.99x
query8: Previous (1160.6666666666667 ms) vs Current (1158.0 ms) Diff 2 E2E 1.00x
query9: Previous (9345.333333333334 ms) vs Current (5375.666666666667 ms) Diff 3969 E2E 1.74x
query10: Previous (1755.6666666666667 ms) vs Current (1872.0 ms) Diff -116 E2E 0.94x
query11: Previous (5698.0 ms) vs Current (5664.333333333333 ms) Diff 33 E2E 1.01x
query12: Previous (684.3333333333334 ms) vs Current (645.6666666666666 ms) Diff 38 E2E 1.06x
query13: Previous (1791.6666666666667 ms) vs Current (1822.3333333333333 ms) Diff -30 E2E 0.98x
query14_part1: Previous (7540.666666666667 ms) vs Current (8108.333333333333 ms) Diff -567 E2E 0.93x
query14_part2: Previous (6580.0 ms) vs Current (6696.0 ms) Diff -116 E2E 0.98x
query15: Previous (1225.3333333333333 ms) vs Current (1218.0 ms) Diff 7 E2E 1.01x
query16: Previous (4012.3333333333335 ms) vs Current (4044.3333333333335 ms) Diff -32 E2E 0.99x
query17: Previous (2047.3333333333333 ms) vs Current (2044.6666666666667 ms) Diff 2 E2E 1.00x
query18: Previous (2143.0 ms) vs Current (2340.0 ms) Diff -197 E2E 0.92x
query19: Previous (1456.6666666666667 ms) vs Current (1411.6666666666667 ms) Diff 45 E2E 1.03x
query20: Previous (706.6666666666666 ms) vs Current (731.3333333333334 ms) Diff -24 E2E 0.97x
query21: Previous (755.0 ms) vs Current (714.3333333333334 ms) Diff 40 E2E 1.06x
query22: Previous (1390.6666666666667 ms) vs Current (1477.0 ms) Diff -86 E2E 0.94x
query23_part1: Previous (13383.666666666666 ms) vs Current (14087.0 ms) Diff -703 E2E 0.95x
query23_part2: Previous (17956.666666666668 ms) vs Current (18554.333333333332 ms) Diff -597 E2E 0.97x
query24_part1: Previous (9003.0 ms) vs Current (9031.0 ms) Diff -28 E2E 1.00x
query24_part2: Previous (8799.666666666666 ms) vs Current (9124.0 ms) Diff -324 E2E 0.96x
query25: Previous (1918.6666666666667 ms) vs Current (2010.6666666666667 ms) Diff -92 E2E 0.95x
query26: Previous (1468.0 ms) vs Current (1388.3333333333333 ms) Diff 79 E2E 1.06x
query27: Previous (1416.6666666666667 ms) vs Current (1538.3333333333333 ms) Diff -121 E2E 0.92x
query28: Previous (7904.333333333333 ms) vs Current (7976.0 ms) Diff -71 E2E 0.99x
query29: Previous (2934.3333333333335 ms) vs Current (3454.0 ms) Diff -519 E2E 0.85x
query30: Previous (2604.6666666666665 ms) vs Current (2329.6666666666665 ms) Diff 275 E2E 1.12x
query31: Previous (2442.3333333333335 ms) vs Current (2420.6666666666665 ms) Diff 21 E2E 1.01x
query32: Previous (1191.0 ms) vs Current (948.0 ms) Diff 243 E2E 1.26x
query33: Previous (1175.6666666666667 ms) vs Current (1197.3333333333333 ms) Diff -21 E2E 0.98x
query34: Previous (2532.6666666666665 ms) vs Current (2481.6666666666665 ms) Diff 51 E2E 1.02x
query35: Previous (2159.3333333333335 ms) vs Current (2045.0 ms) Diff 114 E2E 1.06x
query36: Previous (1527.0 ms) vs Current (1550.3333333333333 ms) Diff -23 E2E 0.98x
query37: Previous (1414.0 ms) vs Current (1369.6666666666667 ms) Diff 44 E2E 1.03x
query38: Previous (2735.0 ms) vs Current (2708.6666666666665 ms) Diff 26 E2E 1.01x
query39_part1: Previous (2164.0 ms) vs Current (2088.3333333333335 ms) Diff 75 E2E 1.04x
query39_part2: Previous (1595.0 ms) vs Current (1616.3333333333333 ms) Diff -21 E2E 0.99x
query40: Previous (1349.3333333333333 ms) vs Current (1389.0 ms) Diff -39 E2E 0.97x
query41: Previous (401.6666666666667 ms) vs Current (381.3333333333333 ms) Diff 20 E2E 1.05x
query42: Previous (457.6666666666667 ms) vs Current (398.3333333333333 ms) Diff 59 E2E 1.15x
query43: Previous (1125.3333333333333 ms) vs Current (1153.0 ms) Diff -27 E2E 0.98x
query44: Previous (828.6666666666666 ms) vs Current (815.3333333333334 ms) Diff 13 E2E 1.02x
query45: Previous (1332.0 ms) vs Current (1333.0 ms) Diff -1 E2E 1.00x
query46: Previous (1828.6666666666667 ms) vs Current (1610.0 ms) Diff 218 E2E 1.14x
query47: Previous (2201.3333333333335 ms) vs Current (2231.6666666666665 ms) Diff -30 E2E 0.99x
query48: Previous (1289.0 ms) vs Current (1366.3333333333333 ms) Diff -77 E2E 0.94x
query49: Previous (2375.0 ms) vs Current (2423.6666666666665 ms) Diff -48 E2E 0.98x
query50: Previous (9228.333333333334 ms) vs Current (9034.666666666666 ms) Diff 193 E2E 1.02x
query51: Previous (2085.6666666666665 ms) vs Current (2329.0 ms) Diff -243 E2E 0.90x
query52: Previous (591.3333333333334 ms) vs Current (610.3333333333334 ms) Diff -19 E2E 0.97x
query53: Previous (887.0 ms) vs Current (862.6666666666666 ms) Diff 24 E2E 1.03x
query54: Previous (1849.3333333333333 ms) vs Current (1813.0 ms) Diff 36 E2E 1.02x
query55: Previous (518.6666666666666 ms) vs Current (497.6666666666667 ms) Diff 20 E2E 1.04x
query56: Previous (1090.6666666666667 ms) vs Current (1085.6666666666667 ms) Diff 5 E2E 1.00x
query57: Previous (2208.3333333333335 ms) vs Current (1708.3333333333333 ms) Diff 500 E2E 1.29x
query58: Previous (1030.6666666666667 ms) vs Current (1115.3333333333333 ms) Diff -84 E2E 0.92x
query59: Previous (2516.3333333333335 ms) vs Current (2624.0 ms) Diff -107 E2E 0.96x
query60: Previous (1354.0 ms) vs Current (1294.6666666666667 ms) Diff 59 E2E 1.05x
query61: Previous (1527.3333333333333 ms) vs Current (1408.0 ms) Diff 119 E2E 1.08x
query62: Previous (1490.6666666666667 ms) vs Current (1507.3333333333333 ms) Diff -16 E2E 0.99x
query63: Previous (998.0 ms) vs Current (999.3333333333334 ms) Diff -1 E2E 1.00x
query64: Previous (16576.333333333332 ms) vs Current (16515.333333333332 ms) Diff 61 E2E 1.00x
query65: Previous (3914.6666666666665 ms) vs Current (3947.0 ms) Diff -32 E2E 0.99x
query66: Previous (3383.3333333333335 ms) vs Current (3360.0 ms) Diff 23 E2E 1.01x
query67: Previous (28848.0 ms) vs Current (28505.0 ms) Diff 343 E2E 1.01x
query68: Previous (1472.0 ms) vs Current (1400.0 ms) Diff 72 E2E 1.05x
query69: Previous (1523.3333333333333 ms) vs Current (1569.6666666666667 ms) Diff -46 E2E 0.97x
query70: Previous (1876.3333333333333 ms) vs Current (1797.6666666666667 ms) Diff 78 E2E 1.04x
query71: Previous (4076.0 ms) vs Current (4098.666666666667 ms) Diff -22 E2E 0.99x
query72: Previous (3457.6666666666665 ms) vs Current (3291.3333333333335 ms) Diff 166 E2E 1.05x
query73: Previous (1205.0 ms) vs Current (1249.6666666666667 ms) Diff -44 E2E 0.96x
query74: Previous (4539.0 ms) vs Current (4300.666666666667 ms) Diff 238 E2E 1.06x
query75: Previous (8472.0 ms) vs Current (8588.333333333334 ms) Diff -116 E2E 0.99x
query76: Previous (5985.666666666667 ms) vs Current (3096.3333333333335 ms) Diff 2889 E2E 1.93x
query77: Previous (1246.6666666666667 ms) vs Current (1152.6666666666667 ms) Diff 94 E2E 1.08x
query78: Previous (8933.0 ms) vs Current (9010.333333333334 ms) Diff -77 E2E 0.99x
query79: Previous (1256.6666666666667 ms) vs Current (1458.0 ms) Diff -201 E2E 0.86x
query80: Previous (4989.0 ms) vs Current (4801.333333333333 ms) Diff 187 E2E 1.04x
query81: Previous (2466.0 ms) vs Current (2396.6666666666665 ms) Diff 69 E2E 1.03x
query82: Previous (2457.3333333333335 ms) vs Current (2557.0 ms) Diff -99 E2E 0.96x
query83: Previous (835.6666666666666 ms) vs Current (832.6666666666666 ms) Diff 3 E2E 1.00x
query84: Previous (1933.0 ms) vs Current (1569.3333333333333 ms) Diff 363 E2E 1.23x
query85: Previous (1911.6666666666667 ms) vs Current (1751.0 ms) Diff 160 E2E 1.09x
query86: Previous (1103.3333333333333 ms) vs Current (1107.0 ms) Diff -3 E2E 1.00x
query87: Previous (2615.0 ms) vs Current (2565.0 ms) Diff 50 E2E 1.02x
query88: Previous (5482.0 ms) vs Current (5435.0 ms) Diff 47 E2E 1.01x
query89: Previous (1485.0 ms) vs Current (1514.3333333333333 ms) Diff -29 E2E 0.98x
query90: Previous (1197.3333333333333 ms) vs Current (1106.0 ms) Diff 91 E2E 1.08x
query91: Previous (1237.6666666666667 ms) vs Current (1247.0 ms) Diff -9 E2E 0.99x
query92: Previous (725.3333333333334 ms) vs Current (658.0 ms) Diff 67 E2E 1.10x
query93: Previous (10936.333333333334 ms) vs Current (10472.0 ms) Diff 464 E2E 1.04x
query94: Previous (4522.333333333333 ms) vs Current (4524.333333333333 ms) Diff -2 E2E 1.00x
query95: Previous (6579.0 ms) vs Current (6594.0 ms) Diff -15 E2E 1.00x
query96: Previous (5449.0 ms) vs Current (5373.0 ms) Diff 76 E2E 1.01x
query97: Previous (2201.6666666666665 ms) vs Current (2155.3333333333335 ms) Diff 46 E2E 1.02x
query98: Previous (1754.6666666666667 ms) vs Current (1671.3333333333333 ms) Diff 83 E2E 1.05x
query99: Previous (2092.6666666666665 ms) vs Current (2536.3333333333335 ms) Diff -443 E2E 0.83x
benchmark: Previous (356333.3333333333 ms) vs Current (351000.0 ms) Diff 5333 E2E 1.02x
binmahone commented 2 days ago

build

revans2 commented 1 day ago

The premerge tests failed, and there were some memory leaks in there too.