dolthub / dolt

Dolt – Git for Data
Apache License 2.0
17.87k stars 508 forks source link

Dolt takes forever to generate join plans for a join of many tables. #6713

Closed nicktobey closed 1 year ago

nicktobey commented 1 year ago

The following is a test query from sqllogictests:

SELECT x63,x53,x62,x52,x11,x5,x40,x64,x27,x28,x21,x41,x22,x30,x16,x14,x56,x32,x46,x50,x1,x34
  FROM t46,t34,t1,t32,t53,t21,t63,t11,t30,t62,t27,t50,t16,t64,t40,t56,t22,t28,t52,t5,t41,t14
 WHERE a21=b5
   AND b30=a52
   AND a62=b46
   AND a14=3
   AND b52=a28
   AND b53=a14
   AND a63=b28
   AND b40=a56
   AND a11=b64
   AND a53=b22
   AND b1=a34
   AND b32=a41
   AND a50=b63
   AND a64=b62
   AND b11=a30
   AND b27=a40
   AND a22=b56
   AND b21=a46
   AND a1=b50
   AND b34=a16
   AND a27=b16
   AND a5=b41;

MySQL runs this query in under a second. Dolt takes at least several minutes just to generate the plan.

I suspect (but haven't measured yet), that we are testing every possible join order in the "reorder joins" optimization, which leads to an exponential blowup in join planning. We may need a better heuristic for ordering joins that doesn't require generating and costing every possible ordering.

For now, as a stopgap measure, we should probably choose some cutoff when there are too many joins to quickly check every ordering, and just bail out and choose a "good enough" ordering.

nicktobey commented 1 year ago

For context, here is the plan that MySQL generates:

+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+--------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra                                      |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+--------------------------------------------+
|  1 | SIMPLE      | t14   | NULL       | const | PRIMARY       | PRIMARY | 4       | const |    1 |   100.00 | NULL                                       |
|  1 | SIMPLE      | t53   | NULL       | ALL   | PRIMARY       | NULL    | NULL    | NULL  |   10 |    10.00 | Using where                                |
|  1 | SIMPLE      | t22   | NULL       | ALL   | PRIMARY       | NULL    | NULL    | NULL  |   10 |    10.00 | Using where; Using join buffer (hash join) |
|  1 | SIMPLE      | t56   | NULL       | ALL   | PRIMARY       | NULL    | NULL    | NULL  |   10 |    10.00 | Using where; Using join buffer (hash join) |
|  1 | SIMPLE      | t40   | NULL       | ALL   | PRIMARY       | NULL    | NULL    | NULL  |   10 |    10.00 | Using where; Using join buffer (hash join) |
|  1 | SIMPLE      | t27   | NULL       | ALL   | PRIMARY       | NULL    | NULL    | NULL  |   10 |    10.00 | Using where; Using join buffer (hash join) |
|  1 | SIMPLE      | t16   | NULL       | ALL   | PRIMARY       | NULL    | NULL    | NULL  |   10 |    10.00 | Using where; Using join buffer (hash join) |
|  1 | SIMPLE      | t34   | NULL       | ALL   | PRIMARY       | NULL    | NULL    | NULL  |   10 |    10.00 | Using where; Using join buffer (hash join) |
|  1 | SIMPLE      | t1    | NULL       | ALL   | PRIMARY       | NULL    | NULL    | NULL  |   10 |    10.00 | Using where; Using join buffer (hash join) |
|  1 | SIMPLE      | t50   | NULL       | ALL   | PRIMARY       | NULL    | NULL    | NULL  |   10 |    10.00 | Using where; Using join buffer (hash join) |
|  1 | SIMPLE      | t63   | NULL       | ALL   | PRIMARY       | NULL    | NULL    | NULL  |   10 |    10.00 | Using where; Using join buffer (hash join) |
|  1 | SIMPLE      | t28   | NULL       | ALL   | PRIMARY       | NULL    | NULL    | NULL  |   10 |    10.00 | Using where; Using join buffer (hash join) |
|  1 | SIMPLE      | t52   | NULL       | ALL   | PRIMARY       | NULL    | NULL    | NULL  |   10 |    10.00 | Using where; Using join buffer (hash join) |
|  1 | SIMPLE      | t30   | NULL       | ALL   | PRIMARY       | NULL    | NULL    | NULL  |   10 |    10.00 | Using where; Using join buffer (hash join) |
|  1 | SIMPLE      | t11   | NULL       | ALL   | PRIMARY       | NULL    | NULL    | NULL  |   10 |    10.00 | Using where; Using join buffer (hash join) |
|  1 | SIMPLE      | t64   | NULL       | ALL   | PRIMARY       | NULL    | NULL    | NULL  |   10 |    10.00 | Using where; Using join buffer (hash join) |
|  1 | SIMPLE      | t62   | NULL       | ALL   | PRIMARY       | NULL    | NULL    | NULL  |   10 |    10.00 | Using where; Using join buffer (hash join) |
|  1 | SIMPLE      | t46   | NULL       | ALL   | PRIMARY       | NULL    | NULL    | NULL  |   10 |    10.00 | Using where; Using join buffer (hash join) |
|  1 | SIMPLE      | t21   | NULL       | ALL   | PRIMARY       | NULL    | NULL    | NULL  |   10 |    10.00 | Using where; Using join buffer (hash join) |
|  1 | SIMPLE      | t5    | NULL       | ALL   | PRIMARY       | NULL    | NULL    | NULL  |   10 |    10.00 | Using where; Using join buffer (hash join) |
|  1 | SIMPLE      | t41   | NULL       | ALL   | PRIMARY       | NULL    | NULL    | NULL  |   10 |    10.00 | Using where; Using join buffer (hash join) |
|  1 | SIMPLE      | t32   | NULL       | ALL   | NULL          | NULL    | NULL    | NULL  |   10 |    10.00 | Using where; Using join buffer (hash join) |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+--------------------------------------------+

Notably, it does reorder the joins.

It's also worth pointing out that each table is keyed on the a{} column, so each join is a join on a primary key.

nicktobey commented 1 year ago

Unfortunately I don't know if this is going to be as simple as "cut off after a certain number of join plans" because there is a correct join order here, which MySQL finds, and we don't. And that's the join order where each filter can get pushed to be right above its relevant join. Each filter which we fail to push down all the way makes the runtime exponentially slower.

Here's a description of how a person using pen-and-paper might find the correct join order:

1) Commute the expressions so that all the primary keys are on the right (for readability):

SELECT x63,x53,x62,x52,x11,x5,x40,x64,x27,x28,x21,x41,x22,x30,x16,x14,x56,x32,x46,x50,x1,x34
  FROM t46,t34,t1,t32,t53,t21,t63,t11,t30,t62,t27,t50,t16,t64,t40,t56,t22,t28,t52,t5,t41,t14
 WHERE b5=a21
   AND b30=a52
   AND b46=a62
   AND a14=3
   AND b52=a28
   AND b53=a14
   AND b28=a63
   AND b40=a56
   AND b64=a11
   AND b22=a53
   AND b1=a34
   AND b32=a41
   AND b63=a50
   AND b62=a64
   AND b11=a30
   AND b27=a40
   AND a22=b56
   AND b21=a46
   AND b50=a1
   AND b34=a16
   AND b16=a27
   AND b41=a5;

2) Reorder the filters like so:

SELECT x63,x53,x62,x52,x11,x5,x40,x64,x27,x28,x21,x41,x22,x30,x16,x14,x56,x32,x46,x50,x1,x34
  FROM t46,t34,t1,t32,t53,t21,t63,t11,t30,t62,t27,t50,t16,t64,t40,t56,t22,t28,t52,t5,t41,t14
 WHERE b32=a41
   AND b41=a5
   AND b5=a21
   AND b21=a46
   AND b46=a62
   AND b62=a64
   AND b64=a11
   AND b11=a30
   AND b30=a52
   AND b52=a28
   AND b28=a63
   AND b63=a50
   AND b50=a1
   AND b1=a34
   AND b34=a16
   AND b16=a27
   AND b27=a40
   AND b40=a56
   AND b56=a22
   AND b22=a53
   AND b53=a14
   AND a14=3;

3) Order the joins and push down the filters like so:

SELECT x63,x53,x62,x52,x11,x5,x40,x64,x27,x28,x21,x41,x22,x30,x16,x14,x56,x32,x46,x50,x1,x34
  FROM t32 JOIN t41 ON b32=a41 JOIN t5 ON b41=a5 JOIN t21 ON b5=a21 JOIN t46 ON b21=a46 JOIN t62 ON b46=a62 JOIN t64 ON b62=a64 JOIN t11 ON b64=a11 JOIN t30 ON b11=a30 JOIN t52 ON b30=a52 JOIN t28 ON b52=a28 JOIN t63 ON b28=a63 JOIN t50 ON b63=a50 JOIN t1 ON b50=a1 JOIN t34 ON b1=a34 JOIN t16 ON b34=a16 JOIN t27 ON b16=a27 JOIN t40 ON b27=a40 JOIN t56 ON b40=a56 JOIN t22 ON b56=a22 JOIN t53 ON b22=a53 JOIN (t14 WHERE a14=3) ON b53=a14;

This is the only efficient table ordering for this query, so a brute force search is not going to find it without considering every possible ordering.

timsehn commented 1 year ago

Time to inspect MySQL's code?

nicktobey commented 1 year ago

I suspect that MySQL treats this as a constraint problem:

For example, the filter expression b28=a63 (where a63 is a primary key and b28 is unindexed) is only optimizable when it's pushed onto a join between t63 (which can be accessed via its primary index) a child node that already contains t28.

Each filter defines a constraint. If there exist orderings that satisfy all of these constraints, it chooses one of those orderings without considering any others.

In this case, there is exactly one ordering that satisfies all the constraints, and it's pretty trivial for a solver to find.

nicktobey commented 1 year ago

I believe that we already do some handling of this.

It's a very common pattern to write a series of joins where each join introduces a new table and filters ON that table's primary key. If that filter matches against columns from the already-joined tables, then that filter imposes a constraint that prevents us from reordering those tables in the join.

But that handling likely isn't applying here because these aren't ON expressions, they're WHERE expressions. They don't constrain join planning until we push them to be above the relevant JOIN nodes... which we can't do until we decide on a join order... it's a chicken-and-egg problem.

This behavior isn't limited to hypothetical test cases either.

A while back we were investigating a slow query that was a long chain of left joins. We had a theory that there was a more efficient join order that the analyzer wasn't able to pick, because the left joins were imposing a specific join order. But when we replaced the left joins with inner joins, the performance degraded even further (and significantly so.) I now suspect that we may have been running afoul of this issue.

nicktobey commented 1 year ago

This is also a place where Functional Dependencies may be able to help.

Functional Dependencies is an analysis where we determine when one column in a result set uniquely constrains another column in the result set.

So in this example:

And so on. The end result is that b32 ends up constraining every other column in the result set.

Knowing this is extremely helpful for planning because we know that the cardinality of the result set can never exceed the cardinality of b32's original table (provided that we join tables in the correct order, such that at each intermediate step, b32 constrains every other column in the intermediate result set.)

We use this knowledge when costing join plans. But what we could/should be doing is using this knowledge to constrain the search space: we only ever generate plans where this constraint holds.

I suspect this is an equivalent problem to the constraint solving that I mentioned two comments ago.

nicktobey commented 1 year ago

(Also for full context, Dolt does eventually generate a plan after 30 minutes on my machine. The plan it generates is not the optimal plan that MySQL generates, but it's close enough that plan execution still takes a negligible amount of time. I'm attaching the plan that Dolt generates.) plan.txt

It's possible that enough "close enough" plans exist for this test case that merely setting a limit to how many plans the analyzer considers will suffice as a stop-gap measure.

max-hoffman commented 1 year ago

At a high level I think MySQL does bottom up join planning, where they just try to accumulate the next best lookup from a kernel node. We do more top-down, which generates more plans but is subject to traversing too much of the search space when we aren't careful (we aren't). Functional dependencies are a good way to check whether we have found an optimal join plan for a given group, which would let us avoid trying to optimize the group any further. This mixes the best of both worlds -- flexibility to continue exploring variations at a level in the join tree, and a lever to stop exploring when we've found an optimal plan.

A peephole optimization that used FDs to prune join exploration sound nice, how to practically get there without refactoring more of join exploration is hazy to me. The problem in my head is still the ordering problem -- right now we (1) generate all plans (2) cost all plans. In an ideal world we would (1) generate a new plan, (2) cost that plan, (3) decide whether we want to continue exploring new plans. Is there a middle ground that gets us most of the benefit? Do we just need to refactor join exploration to generate new plans on demand?

zachmu commented 1 year ago

I suspect (but haven't measured yet), that we are testing every possible join order in the "reorder joins" optimization, which leads to an exponential blowup in join planning. We may need a better heuristic for ordering joins that doesn't require generating and costing every possible ordering.

It's been a while since I've been in there, but this sounds like a search constraint problem. If we have a best cost so far, then we can prune the search space if the incremental cost ever goes above that as we're generating potential plans.

The key is that you don't need to generate an entire plan before pruning that part of the search space. Lots of table orderings are going to be predictably terrible, there's no need to continue exploring those dead ends.