Flink job complete checkpoint 1, and commit the transaction with label label1
StarRocks FE leader restarts for some reasons, such as upgrade, and flink job fails because FE is down
After FE is restarted, flink job restores from checkpoint 1, and re-commit the transaction with label label1, but FE returns a failed status with error message like UserException: transaction with op commit label 154968ac-c52b-4ae9-8fdf-1df64f285b96 has no backend (see TransactionLoadAction#executeTransaction for details)
Commit failure will trigger the flink job failure again, and in a dead loop, but actually the transaction has been committed successfully before, so the job should run normally
The solution is that check label state if commit failed no matter what reason. If the label state is COMMITTED or VISIBLE, the commit should be successful. This can reduce the dependency for the behavior of StarRocks
Additionally, this PR improves some error messages.
Checklist:
[X] I have added test cases for my bug fix or my new feature
[ ] I have added user document for my new feature or new function
What type of PR is this:
Which issues of this PR fixes :
Problem Summary(Required) :
The case is:
label1
label1
, but FE returns a failed status with error message likeUserException: transaction with op commit label 154968ac-c52b-4ae9-8fdf-1df64f285b96 has no backend
(see TransactionLoadAction#executeTransaction for details)The solution is that check label state if commit failed no matter what reason. If the label state is
COMMITTED
orVISIBLE
, the commit should be successful. This can reduce the dependency for the behavior of StarRocksAdditionally, this PR improves some error messages.
Checklist: