flink-extended / flink-remote-shuffle

Remote Shuffle Service for Flink
Apache License 2.0
191 stars 57 forks source link

Job can switch to new shuffle cluster only when the old cluster is unavailable #47

Closed wsry closed 2 years ago

wsry commented 2 years ago

Motivation

Currently, a running job will switch to the new shuffle cluster if the new one has higher version. However, this may cause the shuffle data of one job is managed by multiple clusters which is complicated and easy to cause problem. By only switching to new cluster when the old cluster is unavailable. We can simplify the logic and further support upgrading without influence the running job if the old cluster is kept until all running jobs finish.

Changes

Job can switch to new shuffle cluster only when the old cluster is unavailable.

Test