flink-extended / flink-remote-shuffle

Remote Shuffle Service for Flink
Apache License 2.0
191 stars 56 forks source link

[FRS-53] Not trigger fatal error if Flink job has finished when notifying new shuffle manager address #54

Closed wsry closed 2 years ago

wsry commented 2 years ago

What is the purpose of the change

This solves #53 . Currently, if a job has finished (either error or not) and at the same time, the shuffle manager leader changes, the new leader notification may cause a fatal error which may lead to JM failover. It mainly influence session cluster when more than one jobs share the same cluster.

Brief change log

Verifying this change

This change added tests.