Open guozhangwang opened 3 years ago
Context from slack: https://confluent.slack.com/archives/C9Z794XSL/p1626364202430400
Related issue: https://github.com/confluentinc/ksql/issues/7622
copying over my thoughts from slack:
We shouldn't make max-retries configurable - if you skip a command you'll wind up with a different set of streams and tables than you should have. Instead we should make the replay more robust by either:
Of these I think the first option is simpler to do.
Also in either case we should have some way to have the user explicitly ask for a given command to be skipped - but this should be something they configure explicitly for that command and they should understand that they may wind up with a totally different set of streams/tables if they do this. This could either be a list of offsets to skip, or we just give them a tool to truncate the command topic.
Today we always pass in
Integer.MAX_VALUE
all the way down to CommandRunner, which would be used asexecuteStatement
:The assumption behind that is if a cmd is successfully written to the cmd topic, then it should be 100% valid and safe, and the execution should not fail. However, there are still scenarios where the execution could still fail, consistently (see an example stacktrace below), and hence retrying indefinitely is actually not the preferred methodology here.
If we cannot guarantee that the above assumption is always true, then we should consider making the retries to be configurable and when retries are exhausted, we consider it as a fatal error and handle it (e.g. delete the corresponding cmd from the topic, and report it to client).