Closed ankurdave closed 10 years ago
This is a surprising result, but here's the benchmark data:
master (4d880304867b55a4f2138617b30600b7fa013b14):
832.123000 s
887.333000 s
793.836000 s
801.045000 s
1138.437000 s
837.581000 s
842.121000 s
829.105000 s
860.752000 s
881.347000 s
Mean: 870.368 +/- 31.2805986743 s
mutable-rdd (3407a8fd2132d85289b6dd22e81ab37796067f0e):
368.209000 s
387.058000 s
383.664000 s
363.390000 s
357.073000 s
375.562000 s
322.679000 s
364.797000 s
386.379000 s
378.450000 s
Mean: 368.7261 +/- 6.06668259421 s
I also ran one trial of PageRank on a different cluster and dataset yesterday. Original runtime was 589 s; with this PR it was 571 s.
This breaks correctness in general, but is OK on our benchmarks.
I benchmarked it for connected components on uk-union, and it provides a 57% speedup. I ran 10 trials of 20 iterations each on 16 m2.4xlarge machines, coalescing the input to 64 partitions. Original runtime was 870 +/- 31 s, and with this PR it was reduced to 369 +/- 6 s.