cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.1k stars 3.81k forks source link

cdc: Export should emit "done" indicator #110985

Open miretskiy opened 1 year ago

miretskiy commented 1 year ago

It could be useful if CDC export emitted "done" indicator. This may be as simple as allowing "resolved" option to be used w/ export so that final "resolved" marker file/message is emitted before changefeed terminates.

Jira issue: CRDB-31703

blathers-crl[bot] commented 1 year ago

cc @cockroachdb/cdc

733amir commented 1 year ago

I'm not sure what the issue is, but I can help solve it.

miretskiy commented 1 year ago

When you start CDC export (e.g. CREATE CHANGEFEED INTO s3... WITH initial_scan=only), the only way to know when it completes is to query the job status. But, you may want consumers not to be dependent on that (maybe they don't even know the job id). You may want consumer to be able to determine if the export finished by looking at s3 bucket/directory. And right now, it's very hard to tell. So, the idea would be to emit a marker file "export.done" or some such to indicate that export completed, so that the consumer can simply watch the directory until file shows up.

One way to accomplish this functionality is to allow "resolved" option to be used when initial_scan=only option specified.

733amir commented 1 year ago

According to the documentation there are multiple sinks and s3 is one of them. How should we do this "done" indicator with each sink?

miretskiy commented 1 year ago

Every sink supports the same interface. For example, EmitResolvedMessage emits resolved message into any sink. For file based sinks, it writes out a file, for message based sinks (kafka, etc) it sends a message. This is similar.