[CELEBORN-1719] Introduce celeborn.client.spark.stageRerun.enabled with alternative celeborn.client.spark.fetch.throwsFetchFailure to enable spark stage rerun #2920
Introduce celeborn.client.spark.stageRerun.enabled with alternative celeborn.client.spark.fetch.throwsFetchFailure to enable spark stage rerun.
Change the default value of celeborn.client.spark.fetch.throwsFetchFailure from false to true, which enables spark stage rerun at default.
Why are the changes needed?
User could not directly understand the meaning of celeborn.client.spark.fetch.throwsFetchFailure as whether to enable stage rerun, which means that client throws FetchFailedException instead of CelebornIOException. It's recommended to introduce celeborn.client.spark.stageRerun.enabled with alternative celeborn.client.spark.fetch.throwsFetchFailure to enable spark stage rerun.
What changes were proposed in this pull request?
celeborn.client.spark.stageRerun.enabled
with alternativeceleborn.client.spark.fetch.throwsFetchFailure
to enable spark stage rerun.celeborn.client.spark.fetch.throwsFetchFailure
fromfalse
totrue
, which enables spark stage rerun at default.Why are the changes needed?
User could not directly understand the meaning of
celeborn.client.spark.fetch.throwsFetchFailure
as whether to enable stage rerun, which means that client throwsFetchFailedException
instead ofCelebornIOException
. It's recommended to introduceceleborn.client.spark.stageRerun.enabled
with alternativeceleborn.client.spark.fetch.throwsFetchFailure
to enable spark stage rerun.Does this PR introduce any user-facing change?
No.
How was this patch tested?
CI.