cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.14k stars 3.81k forks source link

Automatic prepare and execute of point queries #92000

Open msirek opened 2 years ago

msirek commented 2 years ago

Is your feature request related to a problem? Please describe. Significant optimization time is taken to run simple non-parameterized primary key SELECT statements. A simple experiment running 10,000 PK lookups with PREPARE and EXECUTE vs. equivalent non-parameterized SELECTs shows approximately 25% less CPU time and 45% less wall-clock time (as measured via the time command):

create table edges(
        id1 INT,
        id2 INT,
        type INT,
        timestamp bigint,
        value varchar(150),
        primary key (id1, id2, type));
insert into edges values (111328908496399027, 111330008008026848, 2, 123456789123456789, 'Hello world!');
-- parameterized:
PREPARE s1 AS SELECT "timestamp", value FROM edges WHERE id1 = $1 AND id2 = $2 AND type = $3;
EXECUTE s1(111328908496399027, 111330008008026848, 2);
EXECUTE s1(111328908496399027, 111330008008026848, 2);
...

-- non-parameterized:
SELECT "timestamp", value FROM edges WHERE id1 = 111328908496399027 AND id2 = 111330008008026848 AND type = 2;
SELECT "timestamp", value FROM edges WHERE id1 = 111328908496399027 AND id2 = 111330008008026848 AND type = 2;
...
-- parameterized results:
real    0m5.540s
user    0m1.714s
sys     0m0.842s

-- non-parameterized results:
real    0m9.687s
user    0m2.206s
sys     0m1.225s

There is a lot of variability in the results from run to run, but parameterized queries are consistently faster.

Describe the solution you'd like When a non-parameterized primary key lookup query is planned, create a parameterized version of it and place it in the statement cache. On future submissions of PK lookup queries, check if the AST is in the form of a simple PK SELECT, and if so, check if an equivalent parameterized query has been cached. If found in cache, run the statement as an EXECUTE of the cached prepared statement.

Describe alternatives you've considered Build a new memo caching mechanism. If the AST is a simple PK SELECT, build a fingerprint of the query involving the table ID, columns selected, and predicate columns. Cache a pruned memo for the query which only includes a PK-constrained ScanExpr using a new statement caching mechanism that matches on query signature ignoring constant values in the query predicates. Provide an efficient mechanism to share most of the memo between requests, except for the scan Constraint which is efficiently rebuilt for each matched request.

Additional context In the follow profile of non-parameterized primary key lookups, the goal is to eliminate a lot of the overhead in execbuilder.(*Builder).Build, optbuilder.(*Builder).BuildWhere, xform.(*explorer).exploreSelect, and xform.(*Optimizer).optimizeGroupMember, plus a few other paths. We may not be able to avoid the overhead of optbuilder.(*Builder).resolveDataSource because we need to know if the table has changed since the query was optimized. image

Jira issue: CRDB-21514

mgartner commented 1 year ago

We should check how common it is for real world workloads to not use prepared statements, if our telemetry can tell us that. AFAIK many ORMs perform prepared statements automatically, as it's the best way to prevent against SQL injection.