Closed stevendanna closed 1 year ago
We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!
CHANGEFEEDS send data to external systems (sinks) over the network. Over the life of a changefeed, it is likely that attempts to send data to a sink will fail with network errors, cloud permissions problems, server-level errors from the given sink, and more. For most errors, we want to retry these external requests as the issue is most likely transient. Currently, we have 3 ways in which a changefeed may be retried:
We've hit a number of problems recently with our current retries:
Addressing all of these issues will likely require a number of changes. In a recent conversations, we discussed some initial improvements we could make:
From there, we can improve our ability to recognise and specifically mark fatal errors as fatal. Improving retries in the kafka sink may further require that we rely less on the sarama library.
Jira issue: CRDB-7181