Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shuffle data on remote servers
Other
252
stars
72
forks
source link
[Improvement] Reduce the recomputation caused by bad node #169
As we know, when MRAppMaster find the node is a bad node, and the node have execute some map tasks, MRAppMaster
will recompute them, but it's not necessary for RSS, Because RSS don't store any shuffle data in those nodes. So we don't trigger any recomputation caused by bad node.
Why are the changes needed?
Reduce the recomputation, and recomputation will cause reduce fail because the loss of event.
What changes were proposed in this pull request?
As we know, when MRAppMaster find the node is a bad node, and the node have execute some map tasks, MRAppMaster will recompute them, but it's not necessary for RSS, Because RSS don't store any shuffle data in those nodes. So we don't trigger any recomputation caused by bad node.
Why are the changes needed?
Reduce the recomputation, and recomputation will cause reduce fail because the loss of event.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Manual test in our cluster