cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.15k stars 3.81k forks source link

cdc: support window functions #98237

Open HonoreDB opened 1 year ago

HonoreDB commented 1 year ago

Loosely inspired by https://maxhalford.github.io/blog/ogd-in-sql/, especially the last paragraph. If we supported a limited set of PARTITION BY clauses in changefeed expressions that were always intervals over the primary key (or otherwise able to be guaranteed to be within a single changefeed processor), users could write arbitrary streaming calculations over events, using bounded resources, using the Postgres OVER ... PARTITION BY syntax and semantics. That in turn lets you write changefeeds on things like "unusual values in column X, given column X values in other recently seen rows" or even "values in column Y that are surprising given a multivariate regression on the other columns in the table". Or duplicate values, etc.

(Could also be extended to be more of a global map-reduce using the job record to store state but that's harder).

Jira issue: CRDB-25143

Epic CRDB-21713

blathers-crl[bot] commented 1 year ago

cc @cockroachdb/cdc