Currently, a running job will switch to the new shuffle cluster if the new one has higher version. However, this may cause the shuffle data of one job is managed by multiple clusters which is complicated and easy to cause problem. By only switching to new cluster when the old cluster is unavailable. We can simplify the logic and further support upgrading without influence the running job if the old cluster is kept until all running jobs finish.
Changes
Job can switch to new shuffle cluster only when the old cluster is unavailable.
Motivation
Currently, a running job will switch to the new shuffle cluster if the new one has higher version. However, this may cause the shuffle data of one job is managed by multiple clusters which is complicated and easy to cause problem. By only switching to new cluster when the old cluster is unavailable. We can simplify the logic and further support upgrading without influence the running job if the old cluster is kept until all running jobs finish.
Changes
Job can switch to new shuffle cluster only when the old cluster is unavailable.
Test