cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.15k stars 3.81k forks source link

cdc: avoid duplicate schema registrations #99221

Open HonoreDB opened 1 year ago

HonoreDB commented 1 year ago

When we have a large CRDB cluster starting up an Avro changefeed, each processor will post the same schema registration to the same endpoint at about the same time. Users should have an alternative to provisioning a schema registry endpoint that can handle such spiky traffic.

Fixes:

We can implement this by having the processor that registers the schema tell the others what the ID is. https://github.com/cockroachdb/cockroach/pull/99059 is the beginning of an attempt to implement this by synchronizing using the job_info table.

Another approach (or complementary to above) could be to register schemas before distributing the job so that the IDs can be serialized into the processor specs, but that doesn't prevent potential spikes when a schema change occurs.

Jira issue: CRDB-25769

gz#16064

gz#16384

Epic CRDB-25039

blathers-crl[bot] commented 1 year ago

cc @cockroachdb/cdc

miretskiy commented 1 year ago

We should make sure that at the very least, schema registration happens once per node, and not once per parallel processor.

shermanCRL commented 1 year ago

@HonoreDB Have we gone as far as we intend on this, or more to do?

And, are my updates descriptions in the top comment accurate?

HonoreDB commented 1 year ago

Short term, and for stuff we intend to backport, I think we're done--the number of schema registrations is now O(number of nodes * number of table schema versions). Medium term we should still find a way to get rid of that first factor.

HonoreDB commented 1 year ago

Updated the description slightly.

shermanCRL commented 1 year ago

@HonoreDB Thanks. Remind me, what was the big-O before these changes?

HonoreDB commented 1 year ago

The main factor we took out in #99833 was nprocs, number of parallel encoding workers per node, which defaults to number of cpus per node / 4, to a max of 8. We also had duplicate registrations if there were multiple changefeeds on the same table, or a changefeed was restarted, which #99833 mitigates.

HonoreDB commented 1 year ago

So before you could say O(number of processors number of table schema versions number of changefeeds).

amruss commented 1 year ago

Can this be closed?