cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.1k stars 3.81k forks source link

opt: reorder GroupBy and Join operators #47393

Open andy-kimball opened 4 years ago

andy-kimball commented 4 years ago

Implement the GroupBy reordering rules in sections 3.1 and 3.2 of this paper. These sections show how to reorder GroupBy with respect to inner and outer joins.

We should consider two sets of rules:

Normalization: the normal form should either always pull GroupBy operators above joins, or else push them down. I suspect pulling them up would result in fewer transformations. The reason to do this normalization is to try and group together joins with one another. This makes it more likely that the optimizer can remove unnecessary joins. Also, when we do more sophisticated join ordering analysis, we want joins grouped together.

Exploration: from an execution speed point of view, sometimes it's better to join and then group, and sometimes it's better to group and then join. We need the coster to examine both options, and select the option with the lower estimated cost.

Jira issue: CRDB-4417

RaduBerinde commented 4 years ago

See https://github.com/cockroachdb/cockroach/issues/54597 for a concrete case.

jrote1 commented 4 years ago

@RaduBerinde is there any timeline to when this issue will be fixed?

RaduBerinde commented 4 years ago

It is not on our roadmap for the next major release, so I would say the release in fall 2021 at the earliest..

jrote1 commented 4 years ago

I would be keen to look into this, not sure how much help I will be though. Is there any pointers that you could give me? Similar historical PRs etc?

RaduBerinde commented 4 years ago

I'd say the places to start are:

Thank you!

nehageorge commented 3 years ago

The requirements of this issue have changed. See https://github.com/cockroachdb/cockroach/pull/70988#issuecomment-932344330.

remysucre commented 3 years ago

We have encountered similar issues in nearly all projects using equality saturation, because almost all languages contain arithmetics and have rules for associativity, commutativity, and distributivity. Applying the rules until fix point (equivalent to growing the memo until it stops changing) will suffer from exponential explosion. Our solution is to timeout after a number of rule applications, while scheduling each rule so that no one dominates. SQL Server takes a similar approach, using timeouts and stages. So to implement this issue, one solution is to do timeout and scheduling for the rewrite rules, then run the DP join reordering at the end. This does not require implementing the new "groupjoin" operator described in the paper linked from the comment above.

Theoretically, the optimal way to solve this is Functional Aggregate Queries (paper, slides). The paper looks scary but the main ideas are rather simple (variable elimination, indicator projection). Still, it would require major effort by a highly experienced team to implement. Hung and Mahmoud are the people behind relationalAI. It's possible that FAQ is related to the paper on groupjoins.

nehageorge commented 2 years ago

Removing my assignment as I will no longer be working on this.

github-actions[bot] commented 1 year ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!

andy-kimball commented 1 year ago

Still relevant.