Closed wsry closed 2 years ago
Hi @wsry Is this also solved by https://github.com/flink-extended/flink-remote-shuffle/pull/49 ?
Hi @wsry Is this also solved by #49 ?
This works together with #49. When there are multiple clusters, a job can select the proper one with some strategy. Currently, the strategy is always selecting the rss with higher version. In the future, we can Implement more complicated strategy, like auto switch some job to new version for grayscale test.
What is the purpose of the change
This solves #47 . Currently, a running job will switch to the new shuffle cluster if the new one has higher version. However, this may cause the shuffle data of one job is managed by multiple clusters which is complicated and easy to cause problem. By only switching to new cluster when the old cluster is unavailable. We can simplify the logic and further support upgrading without influence the running job if the old cluster is kept until all running jobs finish.
Brief change log
Verifying this change
This change added tests.