cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.07k stars 3.8k forks source link

sql: subquery results cannot be used to constrain index scans #23264

Open jordanlewis opened 6 years ago

jordanlewis commented 6 years ago

Filter conditions obtained from subqueries don't constrain the spans for a scan.

root@:26257/test> create table foo (a int primary key);
CREATE TABLE
root@:26257/test> explain select * from foo where a=1;
+------+-------+-------------+
| Tree | Field | Description |
+------+-------+-------------+
| scan |       |             |
|      | table | foo@primary |
|      | spans | /1-/1/#     |
+------+-------+-------------+

root@:26257/test> explain select * from foo where a=(select 1);
+------------------------------+-----------+-------------+
|             Tree             |   Field   | Description |
+------------------------------+-----------+-------------+
| root                         |           |             |
|  ├── scan                    |           |             |
|  │                           | table     | foo@primary |
|  │                           | spans     | ALL         |
|  └── subquery                |           |             |
|       │                      | id        | @S1         |
|       │                      | sql       | (SELECT 1)  |
|       │                      | exec mode | one row     |
|       └── limit              |           |             |
|            └── render        |           |             |
|                 └── emptyrow |           |             |
+------------------------------+-----------+-------------+

In the second explain, I expect that the scan should have spans = /1-/1/# like in the first explain.

Looks like this is the same behavior as in 1.1. Maybe this is a known limitation?

@RaduBerinde for triage.

Jira issue: CRDB-5827

jordanlewis commented 6 years ago

This also occurs when using IN instead of = in the filter condition, BTW.

RaduBerinde commented 6 years ago

We don't (can't?) currently use the results of subqueries for span generation. This is a known limitation of the way we handle subqueries. CC @knz

knz commented 6 years ago

we could (both with current code and new code) for uncorrelated subqueries by evaluating the subqueries prior to span generation.

jordanlewis commented 4 years ago

A forum user ran into this, here: https://forum.cockroachlabs.com/t/query-too-slow-does-full-scan-when-using-subquery/3740

igremmerlb commented 3 years ago

Just chiming in here. We are running into this issue with an INSERT ... ON CONFLICT ... DO UPDATE SET query. All relevant parts have indexes, but it does a full table scan for the table with primary key instead of just using the primary key index.

andy-kimball commented 3 years ago

It would be trivial to hoist uncorrelated subqueries (see deriveHasHoistableSubquery where we only hoist subqueries that are correlated). The only reason I did not do that originally is because I was unsure if that was always the best plan. We'd need to do more testing and experimentation to decide if that's true. If it's not, then it gets quite a bit more complex, as we have to either find heuristics to decide whether to hoist, or use exploration rule(s). If we find we have to go that direction, then maybe we can find certain special cases where it's always beneficial to hoist (like when it's possible to constrain an index scan).

irfansharif commented 3 years ago

https://github.com/cockroachdb/cockroach/pull/69047#pullrequestreview-738495478 is an example of where we'd have benefited from this optimization, though we ended up finding another formulation to give us what we want.

github-actions[bot] commented 1 year ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!

andy-kimball commented 1 year ago

Not yet addressed.

michae2 commented 4 months ago

This seems to be fixed now (as of 24.1):

demo@127.0.0.1:26257/demoapp/defaultdb> create table foo (a int primary key);
CREATE TABLE

Time: 5ms total (execution 4ms / network 1ms)

demo@127.0.0.1:26257/demoapp/defaultdb> explain select * from foo where a=(select 1);
          info
-------------------------
  distribution: local
  vectorized: true

  • scan
    missing stats
    table: foo@foo_pkey
    spans: [/1 - /1]
(7 rows)

Time: 8ms total (execution 8ms / network 0ms)

demo@127.0.0.1:26257/demoapp/defaultdb> explain select * from foo where a=(select max(a) from foo);
                                         info
---------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • lookup join
  │ estimated row count: 1
  │ table: foo@foo_pkey
  │ equality: (any_not_null) = (a)
  │ equality cols are key
  │
  └── • group (scalar)
      │ estimated row count: 1
      │
      └── • revscan
            estimated row count: 1 (100% of the table; stats collected 2 minutes ago)
            table: foo@foo_pkey
            spans: LIMITED SCAN
            limit: 1
(17 rows)

Time: 4ms total (execution 3ms / network 1ms)
yuzefovich commented 4 months ago

I don't think it's addressed generally speaking, only in specific cases.

DrewKimball commented 4 months ago

[triage]: There's an optimizer rule that replaces min/max with a limit, and another that works for equality with a subquery: https://github.com/cockroachdb/cockroach/pull/100881. We might not be able to handle inequalities with subqueries, or other special cases that don't have existing rules.

michae2 commented 4 months ago

Here's an example that's still unconstrained on v24.1:

demo@127.0.0.1:26257/demoapp/defaultdb> CREATE TABLE a (a INT PRIMARY KEY);
CREATE TABLE

Time: 3ms total (execution 3ms / network 0ms)

demo@127.0.0.1:26257/demoapp/defaultdb> EXPLAIN SELECT * FROM a WHERE a > (SELECT * FROM a ORDER BY a LIMIT 1);
                            info
------------------------------------------------------------
  distribution: local
  vectorized: true

  • root
  │
  ├── • filter
  │   │ filter: a > @S1
  │   │
  │   └── • scan
  │         missing stats
  │         table: a@a_pkey
  │         spans: FULL SCAN
  │
  └── • subquery
      │ id: @S1
      │ original sql: (SELECT * FROM a ORDER BY a LIMIT 1)
      │ exec mode: one row
      │
      └── • scan
            missing stats
            table: a@a_pkey
            spans: LIMITED SCAN
            limit: 1
(23 rows)

Time: 8ms total (execution 7ms / network 0ms)
DrewKimball commented 4 months ago

[triage] possible dupes/related issues: https://github.com/cockroachdb/cockroach/issues/51820 https://github.com/cockroachdb/cockroach/issues/93829 https://github.com/cockroachdb/cockroach/issues/100855