NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
https://nvidia.github.io/spark-rapids
Apache License 2.0
822 stars 235 forks source link

Integrate with kudo #11724

Closed liurenjie1024 closed 5 hours ago

liurenjie1024 commented 6 days ago

This pr introduces integration with kudo serialization format in spark rapids, for epic issue please see https://github.com/NVIDIA/spark-rapids/issues/11590.

liurenjie1024 commented 6 days ago

Currently blocked by https://github.com/NVIDIA/spark-rapids-jni/pull/2596, but it's ready for review.

liurenjie1024 commented 3 days ago

Currently in local dev env: hash_aggregate_test, join_test, repart_test passed.

The build break waiting for https://github.com/NVIDIA/spark-rapids-jni/pull/2601 to be merge.

liurenjie1024 commented 3 days ago

build

liurenjie1024 commented 3 days ago

build

liurenjie1024 commented 2 days ago

build

liurenjie1024 commented 2 days ago

build

firestarman commented 2 days ago

We still need integration tests for both JCudfSerialization and Kudo until JCudfSerialization is totally dropped.

liurenjie1024 commented 2 days ago

build

liurenjie1024 commented 2 days ago

We still need integration tests for both JCudfSerialization and Kudo until JCudfSerialization is totally dropped.

Offline synced with @firestarman , and it's exactly what I mean in https://github.com/NVIDIA/spark-rapids/pull/11724#discussion_r1843509785

liurenjie1024 commented 2 days ago

build

liurenjie1024 commented 2 days ago

build

liurenjie1024 commented 1 day ago

build

liurenjie1024 commented 1 day ago

build

liurenjie1024 commented 22 hours ago

Hi, @jlowe here is the nds reuslt:

Regression alerts
-----------------
--------------------------------------------------------------------
Name = query37
Means = 3967.0, 3444.5
Time diff = 522.5
Speedup = 1.1516911017564233
T-Test (test statistic, p value, df) = 8.343994880023468, 0.014060997367645014, 2.0
T-Test Confidence Interval = 253.06838078262206, 791.9316192173779
ALERT: significant change has been detected (p-value < 0.05)
ALERT: improvement in performance has been observed
--------------------------------------------------------------------
Name = query75
Means = 14516.0, 12377.5
Time diff = 2138.5
Speedup = 1.1727731771359322
T-Test (test statistic, p value, df) = 9.720228613014378, 0.010418812730883943, 2.0
T-Test Confidence Interval = 1191.8943975210916, 3085.1056024789086
ALERT: significant change has been detected (p-value < 0.05)
ALERT: improvement in performance has been observed
--------------------------------------------------------------------
Name = query91
Means = 2072.5, 1410.5
Time diff = 662.0
Speedup = 1.4693371144984049
T-Test (test statistic, p value, df) = 4.9159244124719645, 0.03897666760707455, 2.0
T-Test Confidence Interval = 82.58586176947449, 1241.4141382305256
ALERT: significant change has been detected (p-value < 0.05)
ALERT: improvement in performance has been observed

Speedup results
-----------------
query1: Previous (2639.5 ms) vs Current (2431.5 ms) Diff 208 E2E 1.09x
query2: Previous (3575.0 ms) vs Current (3191.0 ms) Diff 384 E2E 1.12x
query3: Previous (808.0 ms) vs Current (771.0 ms) Diff 37 E2E 1.05x
query4: Previous (13477.5 ms) vs Current (13190.0 ms) Diff 287 E2E 1.02x
query5: Previous (5848.5 ms) vs Current (4467.5 ms) Diff 1381 E2E 1.31x
query6: Previous (1888.5 ms) vs Current (1248.5 ms) Diff 640 E2E 1.51x
query7: Previous (3871.5 ms) vs Current (3846.0 ms) Diff 25 E2E 1.01x
query8: Previous (2185.5 ms) vs Current (2152.5 ms) Diff 33 E2E 1.02x
query9: Previous (11043.0 ms) vs Current (8771.0 ms) Diff 2272 E2E 1.26x
query10: Previous (2950.0 ms) vs Current (2849.5 ms) Diff 100 E2E 1.04x
query11: Previous (6987.5 ms) vs Current (7179.0 ms) Diff -191 E2E 0.97x
query12: Previous (973.5 ms) vs Current (1058.5 ms) Diff -85 E2E 0.92x
query13: Previous (2041.5 ms) vs Current (2122.5 ms) Diff -81 E2E 0.96x
query14_part1: Previous (13473.0 ms) vs Current (13205.0 ms) Diff 268 E2E 1.02x
query14_part2: Previous (11956.0 ms) vs Current (11055.5 ms) Diff 900 E2E 1.08x
query15: Previous (2115.0 ms) vs Current (1866.5 ms) Diff 248 E2E 1.13x
query16: Previous (7717.5 ms) vs Current (7850.0 ms) Diff -132 E2E 0.98x
query17: Previous (2771.0 ms) vs Current (2748.0 ms) Diff 23 E2E 1.01x
query18: Previous (3453.5 ms) vs Current (3252.0 ms) Diff 201 E2E 1.06x
query19: Previous (2616.5 ms) vs Current (2551.5 ms) Diff 65 E2E 1.03x
query20: Previous (953.5 ms) vs Current (1188.0 ms) Diff -234 E2E 0.80x
query21: Previous (945.0 ms) vs Current (834.5 ms) Diff 110 E2E 1.13x
query22: Previous (2152.0 ms) vs Current (2079.0 ms) Diff 73 E2E 1.04x
query23_part1: Previous (21963.5 ms) vs Current (23232.5 ms) Diff -1269 E2E 0.95x
query23_part2: Previous (34624.5 ms) vs Current (35806.5 ms) Diff -1182 E2E 0.97x
query24_part1: Previous (16391.5 ms) vs Current (13801.5 ms) Diff 2590 E2E 1.19x
query24_part2: Previous (15305.5 ms) vs Current (13821.0 ms) Diff 1484 E2E 1.11x
query25: Previous (2548.0 ms) vs Current (2559.0 ms) Diff -11 E2E 1.00x
query26: Previous (1549.0 ms) vs Current (1537.5 ms) Diff 11 E2E 1.01x
query27: Previous (1811.5 ms) vs Current (1887.5 ms) Diff -76 E2E 0.96x
query28: Previous (12445.0 ms) vs Current (13100.5 ms) Diff -655 E2E 0.95x
query29: Previous (4790.5 ms) vs Current (4772.0 ms) Diff 18 E2E 1.00x
query30: Previous (3391.5 ms) vs Current (3205.0 ms) Diff 186 E2E 1.06x
query31: Previous (3178.0 ms) vs Current (3296.5 ms) Diff -118 E2E 0.96x
query32: Previous (1576.5 ms) vs Current (1570.5 ms) Diff 6 E2E 1.00x
query33: Previous (1855.0 ms) vs Current (1577.5 ms) Diff 277 E2E 1.18x
query34: Previous (3138.5 ms) vs Current (3168.0 ms) Diff -29 E2E 0.99x
query35: Previous (3400.5 ms) vs Current (3232.5 ms) Diff 168 E2E 1.05x
query36: Previous (1727.5 ms) vs Current (1992.0 ms) Diff -264 E2E 0.87x
query37: Previous (3967.0 ms) vs Current (3444.5 ms) Diff 522 E2E 1.15x
query38: Previous (4052.0 ms) vs Current (4475.0 ms) Diff -423 E2E 0.91x
query39_part1: Previous (2800.5 ms) vs Current (2729.0 ms) Diff 71 E2E 1.03x
query39_part2: Previous (2053.0 ms) vs Current (1965.0 ms) Diff 88 E2E 1.04x
query40: Previous (4161.0 ms) vs Current (3197.0 ms) Diff 964 E2E 1.30x
query41: Previous (429.5 ms) vs Current (482.5 ms) Diff -53 E2E 0.89x
query42: Previous (534.0 ms) vs Current (647.0 ms) Diff -113 E2E 0.83x
query43: Previous (1212.5 ms) vs Current (1299.0 ms) Diff -86 E2E 0.93x
query44: Previous (2387.5 ms) vs Current (1971.0 ms) Diff 416 E2E 1.21x
query45: Previous (2328.0 ms) vs Current (1848.5 ms) Diff 479 E2E 1.26x
query46: Previous (2390.0 ms) vs Current (2266.0 ms) Diff 124 E2E 1.05x
query47: Previous (2705.5 ms) vs Current (2795.5 ms) Diff -90 E2E 0.97x
query48: Previous (1775.0 ms) vs Current (1788.5 ms) Diff -13 E2E 0.99x
query49: Previous (9258.0 ms) vs Current (6874.5 ms) Diff 2383 E2E 1.35x
query50: Previous (12696.0 ms) vs Current (14911.5 ms) Diff -2215 E2E 0.85x
query51: Previous (3313.0 ms) vs Current (3516.0 ms) Diff -203 E2E 0.94x
query52: Previous (946.0 ms) vs Current (880.5 ms) Diff 65 E2E 1.07x
query53: Previous (1340.0 ms) vs Current (1070.0 ms) Diff 270 E2E 1.25x
query54: Previous (2781.5 ms) vs Current (2935.0 ms) Diff -153 E2E 0.95x
query55: Previous (792.0 ms) vs Current (830.0 ms) Diff -38 E2E 0.95x
query56: Previous (1739.5 ms) vs Current (1581.5 ms) Diff 158 E2E 1.10x
query57: Previous (2563.0 ms) vs Current (2175.5 ms) Diff 387 E2E 1.18x
query58: Previous (1789.0 ms) vs Current (1487.5 ms) Diff 301 E2E 1.20x
query59: Previous (3730.5 ms) vs Current (3619.0 ms) Diff 111 E2E 1.03x
query60: Previous (2174.5 ms) vs Current (2063.0 ms) Diff 111 E2E 1.05x
query61: Previous (2696.0 ms) vs Current (1973.0 ms) Diff 723 E2E 1.37x
query62: Previous (3330.0 ms) vs Current (2448.5 ms) Diff 881 E2E 1.36x
query63: Previous (1437.5 ms) vs Current (1325.5 ms) Diff 112 E2E 1.08x
query64: Previous (24634.0 ms) vs Current (23120.0 ms) Diff 1514 E2E 1.07x
query65: Previous (5418.0 ms) vs Current (5581.5 ms) Diff -163 E2E 0.97x
query66: Previous (3434.5 ms) vs Current (7234.5 ms) Diff -3800 E2E 0.47x
query67: Previous (37738.5 ms) vs Current (35666.0 ms) Diff 2072 E2E 1.06x
query68: Previous (2105.0 ms) vs Current (1944.5 ms) Diff 160 E2E 1.08x
query69: Previous (2657.0 ms) vs Current (2402.5 ms) Diff 254 E2E 1.11x
query70: Previous (2456.5 ms) vs Current (2448.0 ms) Diff 8 E2E 1.00x
query71: Previous (4602.5 ms) vs Current (4618.0 ms) Diff -15 E2E 1.00x
query72: Previous (5886.5 ms) vs Current (5975.5 ms) Diff -89 E2E 0.99x
query73: Previous (1553.0 ms) vs Current (1755.0 ms) Diff -202 E2E 0.88x
query74: Previous (5852.0 ms) vs Current (5355.0 ms) Diff 497 E2E 1.09x
query75: Previous (14516.0 ms) vs Current (12377.5 ms) Diff 2138 E2E 1.17x
query76: Previous (6977.5 ms) vs Current (6897.5 ms) Diff 80 E2E 1.01x
query77: Previous (1992.0 ms) vs Current (1682.5 ms) Diff 309 E2E 1.18x
query78: Previous (18316.0 ms) vs Current (18002.5 ms) Diff 313 E2E 1.02x
query79: Previous (2001.0 ms) vs Current (2156.5 ms) Diff -155 E2E 0.93x
query80: Previous (10406.0 ms) vs Current (8814.0 ms) Diff 1592 E2E 1.18x
query81: Previous (3803.0 ms) vs Current (3373.0 ms) Diff 430 E2E 1.13x
query82: Previous (4969.5 ms) vs Current (4904.0 ms) Diff 65 E2E 1.01x
query83: Previous (1288.0 ms) vs Current (1396.0 ms) Diff -108 E2E 0.92x
query84: Previous (3916.5 ms) vs Current (3071.5 ms) Diff 845 E2E 1.28x
query85: Previous (5241.5 ms) vs Current (3678.5 ms) Diff 1563 E2E 1.42x
query86: Previous (1406.0 ms) vs Current (1361.5 ms) Diff 44 E2E 1.03x
query87: Previous (3943.5 ms) vs Current (3607.0 ms) Diff 336 E2E 1.09x
query88: Previous (14866.5 ms) vs Current (12575.0 ms) Diff 2291 E2E 1.18x
query89: Previous (1594.5 ms) vs Current (1797.0 ms) Diff -202 E2E 0.89x
query90: Previous (4602.5 ms) vs Current (3509.5 ms) Diff 1093 E2E 1.31x
query91: Previous (2072.5 ms) vs Current (1410.5 ms) Diff 662 E2E 1.47x
query92: Previous (1120.0 ms) vs Current (1140.5 ms) Diff -20 E2E 0.98x
query93: Previous (16243.5 ms) vs Current (17749.0 ms) Diff -1505 E2E 0.92x
query94: Previous (8941.0 ms) vs Current (8247.5 ms) Diff 693 E2E 1.08x
query95: Previous (15119.0 ms) vs Current (14348.0 ms) Diff 771 E2E 1.05x
query96: Previous (6026.0 ms) vs Current (5921.5 ms) Diff 104 E2E 1.02x
query97: Previous (2724.0 ms) vs Current (2701.5 ms) Diff 22 E2E 1.01x
query98: Previous (2327.0 ms) vs Current (2247.5 ms) Diff 79 E2E 1.04x
query99: Previous (4096.0 ms) vs Current (3731.5 ms) Diff 364 E2E 1.10x
benchmark: Previous (573000.0 ms) vs Current (548500.0 ms) Diff 24500 E2E 1.04x

We have observed 4% perf improvement for e2e test.

mattahrens commented 14 hours ago

We have observed 4% perf improvement for e2e test.

In what environment and for which NDS scale factor did you execute the benchmark?

liurenjie1024 commented 4 hours ago

We have observed 4% perf improvement for e2e test.

In what environment and for which NDS scale factor did you execute the benchmark?

I ran it in spar2a with 3k scale factor.