MemVerge / splash

Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Apache License 2.0
127 stars 29 forks source link

[GH-24] Dump partition when read error happens #25

Closed jealous closed 5 years ago

jealous commented 5 years ago

Dump the current partition to a temp local folder to allow the developer to diagnose the problem when a shuffle read error happens.

This would help the developer to diagnose problems like data corruption.

Make dump a utility function in the resolver and dump partition whenever an exception is caught in SplashShuffleFetcherIterator. Dump files are named like shuffle_0_1_2.dump.

Add dump call in SplashShuffleReader to dump the partition if an error happens in inserting records.

This closes GH-24.

codecov[bot] commented 5 years ago

Codecov Report

Merging #25 into master will decrease coverage by 0.64%. The diff coverage is 51.72%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master      #25      +/-   ##
============================================
- Coverage     76.07%   75.42%   -0.65%     
- Complexity      430      436       +6     
============================================
  Files            30       30              
  Lines          1981     2035      +54     
  Branches        325      332       +7     
============================================
+ Hits           1507     1535      +28     
- Misses          256      274      +18     
- Partials        218      226       +8
Impacted Files Coverage Δ Complexity Δ
...e/spark/shuffle/SplashShuffleFetcherIterator.scala 47.61% <40%> (-10.72%) 8 <4> (+2)
...org/apache/spark/shuffle/SplashShuffleReader.scala 77.58% <45.45%> (-8.13%) 8 <1> (+1)
...che/spark/shuffle/SplashShuffleBlockResolver.scala 77.38% <62.96%> (-2.77%) 35 <2> (+2)
.../apache/spark/shuffle/local/LocalShuffleFile.scala 48.57% <0%> (+5.71%) 14% <0%> (+1%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 381697a...d62a5cb. Read the comment docs.

jealous commented 5 years ago

Yes, I just updated the code to allow the user to configure the folder to dump the files.