MemVerge / splash

Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Apache License 2.0
127 stars 29 forks source link

[GH-49] Shuffle performance tool. #50

Closed jealous closed 5 years ago

jealous commented 5 years ago

Create a shuffle performance tool that allows the user to quickly verify the performance of the storage factory without starting a Spark application.

Here is the sample command of starting a test with the default configurations:

java -cp splash-shaded.jar com.memverge.splash.ShufflePerfTool

You could specify the following options to configure the test:

Sample command:

java -cp target/splash-0.6.1-shaded.jar com.memverge.splash.ShufflePerfTool -d 256 -m 50 -r 50 -t 4 -o

Sample output:

==========================================
Writing 50 shuffle outputs: 100% (50/50)
Write shuffle data completed in 5029 milliseconds
    storage factory:     com.memverge.splash.shared.SharedFSFactory
    shuffle folder:      \tmp\splash\shuffleTest-1\shuffle
    number of mappers:   50
    number of reducers:  50
    total shuffle size:  3GB
    bytes written:       3GB
    bytes read:          0B
    number of blocks:    256
    blocks size:         262KB
    partition size:      1MB
    concurrent tasks:    4
    bandwidth:           667MB/s

==========================================
Reading 2500 partitions:  100% (2500/2500)
Read shuffle data completed in 1813 milliseconds
    Reading index file:  198 ms
    storage factory:     com.memverge.splash.shared.SharedFSFactory
    shuffle folder:      \tmp\splash\shuffleTest-1\shuffle
    number of mappers:   50
    number of reducers:  50
    total shuffle size:  3GB
    bytes written:       3GB
    bytes read:          3GB
    number of blocks:    256
    blocks size:         262KB
    partition size:      1MB
    concurrent tasks:    4
    bandwidth:           1GB/s
codecov[bot] commented 5 years ago

Codecov Report

Merging #50 into master will decrease coverage by 0.32%. The diff coverage is 79.87%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master      #50      +/-   ##
============================================
- Coverage     79.51%   79.19%   -0.33%     
+ Complexity      475      473       -2     
============================================
  Files            31       32       +1     
  Lines          1958     2115     +157     
  Branches        313      333      +20     
============================================
+ Hits           1557     1675     +118     
- Misses          203      227      +24     
- Partials        198      213      +15
Impacted Files Coverage Δ Complexity Δ
...n/scala/org/apache/spark/shuffle/SplashUtils.scala 67.92% <100%> (ø) 0 <0> (ø) :arrow_down:
...java/com/memverge/splash/StorageFactoryHolder.java 86% <100%> (+1.21%) 14 <1> (+1) :arrow_up:
...in/scala/com/memverge/splash/ShufflePerfTool.scala 79.08% <79.08%> (ø) 0 <0> (?)
...in/scala/org/apache/spark/shuffle/SplashOpts.scala 66.12% <0%> (-12.91%) 0% <0%> (ø)
src/main/java/com/memverge/splash/TempFolder.java 77.55% <0%> (+2.04%) 14% <0%> (+1%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c5c2ac5...f5da9f0. Read the comment docs.

codecov[bot] commented 5 years ago

Codecov Report

Merging #50 into master will increase coverage by 0.26%. The diff coverage is 82.48%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master      #50      +/-   ##
============================================
+ Coverage     79.51%   79.78%   +0.26%     
- Complexity      475      477       +2     
============================================
  Files            31       32       +1     
  Lines          1958     2172     +214     
  Branches        313      341      +28     
============================================
+ Hits           1557     1733     +176     
- Misses          203      221      +18     
- Partials        198      218      +20
Impacted Files Coverage Δ Complexity Δ
...n/scala/org/apache/spark/shuffle/SplashUtils.scala 67.92% <100%> (ø) 0 <0> (ø) :arrow_down:
...java/com/memverge/splash/StorageFactoryHolder.java 86% <100%> (+1.21%) 14 <1> (+1) :arrow_up:
...om/memverge/splash/shared/SharedFSShuffleFile.java 81.57% <100%> (+4.91%) 11 <0> (+1) :arrow_up:
...in/scala/com/memverge/splash/ShufflePerfTool.scala 81.18% <81.18%> (ø) 0 <0> (?)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c5c2ac5...503a5a3. Read the comment docs.