StarRocks / starrocks-connector-for-apache-flink

Apache License 2.0
195 stars 156 forks source link

[Bugfix] Check label state if fail to commit because FE restarts #195

Closed banmoy closed 1 year ago

banmoy commented 1 year ago

What type of PR is this:

Which issues of this PR fixes :

Problem Summary(Required) :

The case is:

  1. Flink job complete checkpoint 1, and commit the transaction with label label1
  2. StarRocks FE leader restarts for some reasons, such as upgrade, and flink job fails because FE is down
  3. After FE is restarted, flink job restores from checkpoint 1, and re-commit the transaction with label label1, but FE returns a failed status with error message like UserException: transaction with op commit label 154968ac-c52b-4ae9-8fdf-1df64f285b96 has no backend (see TransactionLoadAction#executeTransaction for details)
  4. Commit failure will trigger the flink job failure again, and in a dead loop, but actually the transaction has been committed successfully before, so the job should run normally

The solution is that check label state if commit failed no matter what reason. If the label state is COMMITTED or VISIBLE, the commit should be successful. This can reduce the dependency for the behavior of StarRocks

Additionally, this PR improves some error messages.

Checklist: