Closed cxzl25 closed 3 weeks ago
I have a question: should we retry fetching another replication before throwing a FetchFailedException when the conf celeborn.client.push.replicate.enabled
is set to true?
I have a question: should we retry fetching another replication before throwing a FetchFailedException when the conf
celeborn.client.push.replicate.enabled
is set to true?
This is not necessarily safe, because the Task may have read part of the data, so it is safer to retry the Task. This is how Spark handles it.
support for checksum/validation of data would a good feature
It looks like we've already done this.
Thank you, merging to main(v0.6.0)/branch-0.5(v0.5.2)/branch-0.4(v0.4.3).
What changes were proposed in this pull request?
Why are the changes needed?
https://github.com/apache/celeborn/pull/2655#pullrequestreview-2213124224
Does this PR introduce any user-facing change?
No
How was this patch tested?
GA