cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.84k stars 3.77k forks source link

changefeedccl: look into making table-aware rangefeeds #85155

Closed HonoreDB closed 1 year ago

HonoreDB commented 2 years ago

Currently changefeeds need to restart whenever the primary index for a watched table changes, including cutting over from an old version of a table to a new one. This is because we've planned our work against the old primary key span, with each changefeed aggregator getting a set of spans to watch in its spec. But what if instead each changefeed aggregator got a set of (shard index, table ID) tuples that it could convert into a logical rangefeed request for "the spans on the active primary index of this table corresponding to these shard indices"? It might result in inefficiencies like moving emitters away from leaseholders, but that's what the replanning work is for.

Jira issue: CRDB-18067

Epic CRDB-25044

blathers-crl[bot] commented 2 years ago

cc @cockroachdb/cdc

ajwerner commented 2 years ago

RangeFeed is explicitly a KV-level API. Nothing about the RangeFeed concept should ever know about sql-level concepts. Literally one of the key roles of the changefeed layer is to map sql concepts to kv concepts. I think it could be fine to move some logic into the changefeedccl/kvfeed layer, but I wouldn't be eager to move more things there.

What is it that this issue is asking for?

HonoreDB commented 2 years ago

I'd like to entertain the thought that RangeFeed is a wrong abstraction (either altogether, or wrong-for-changefeeds-to-know-about). We don't, in fact, care about ranges. We care about all logical operations that match a certain predicate. When changefeeds were first implemented, those predicates could be not-too-lossily expressed as a cursor plus a single immutable span which could be freely divided into ranges. That conversion has gotten a bit lossier, though, with changes to how schema changes work and new changefeed features. It'd be nice to find a way to have a changefeed's role be to map sql concepts onto predicates-on-logical-operations rather than have it be to map them onto spans and then detect when that mapping fails and restart everything.

miretskiy commented 2 years ago

I'm sort of confused as to the whole premise of this issue...

ajwerner commented 2 years ago

rather than have it be to map them onto spans and then detect when that mapping fails and restart everything.

Somebody has to map logical sql concepts onto spans. I agree that the optimizer is better suited than the code in changefeeds, but that's not related to this issue. At the end of the day, everything needs to be expressed as something interacting with kv key spans. That's true also for a SELECT statement. Some layer needs to them be providing access to those KVs. That's the kvclient/rangefeed layer.

Anything you're suggesting lives above that and thus is part of the changefeed.

HonoreDB commented 2 years ago

Typed a more long-winded thing here but decided not to get too implementationy yet, this was meant to be something we'd discuss and put on the backlog as an exploratory thing during triage. Short version is: right now changefeed code goes desc.PrimaryIndexSpan(execCtx.ExecCfg().Codec)) or spanbuilder.Init(desc.GetPrimaryIndex()).SpansFromConstraint... and forwards the resulting spans to a processor. I would rather it did, like, desc.PrimaryIndexSpanSource(...), and piped that stream into a constrainer stream which output spans, or (span, timestamp) tuples, where the timestamp is the earliest timestamp at which write events in that span are considered primary index writes and not secondary/temporary index backfills, and then piped that into the rangefeed API. I don't think that's too controversial. The more speculative piece is that I'd really like it if there were a lower-level API, at the level rangefeed filters are now, that had just enough information that we could push that criterion down further, without literally knowing that's what it was doing.

shermanCRL commented 2 years ago

So predicates and projections get pushed down, is the idea? That the rangefeed gets instantiated/started with projections & predicates, such that the changefeed doesn’t know about them, per se?

ajwerner commented 2 years ago

I think this is all controversial in that any further work to have changefeeds be smarter should be done by having the optimizer plan changefeeds. The question then arises: what are the leaves of the optimizer's plans. Today, the best the optimizer can do is going to be about watching spans (or scanning) spans and then filtering and projecting above them (i.e. we have a table reader node that feeds into some filter node that feeds into some projection node). Maybe in a future world we can have some sort of smarter table reader which does some sort of push-down as the KV API evolves to permit such a thing. Maybe then such an extension to do some sort of push down could happen in the KV API. Regardless, any pushdown across the KV barrier would be column-id and type oriented and not datum oriented because KV doesn't deal in datums and today doesn't deal in decoding rows.

miretskiy commented 1 year ago

@HonoreDB I'm still confused about this issue; and if no work is going to happen here (which I suspect is the case), then please close this issue.