cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.96k stars 3.79k forks source link

sql: enable upsert fast path for eligible INSERT .. ON CONFLICT queries #14482

Open danhhz opened 7 years ago

danhhz commented 7 years ago

The fast path is currently only enabled when the UPSERT alias is explicitly selected by the user. It's possible to fast path some queries of the form INSERT ... ON CONFLICT, but the utility is low and there are lots of edge cases (that caused real correctness bugs #13437 #13962). As a result, we've decided to remove this until after 1.0 and re-enable it then.

TODO(dan): Once it's merged, link to the PR that turned this off so the code can be resurrected.

Jira issue: CRDB-6104

gz#19028

jseldess commented 7 years ago

Docs todo: Add note to UPSERT docs stating that, if you specify all columns and there are no secondary indexes, UPSERT will be faster than the equivalent INSERT .. ON CONFLICT (won't read before writing).

Note: This may be interesting to users simulating kv storage via a 2-column SQL table.

jseldess commented 7 years ago

Docs note added with https://github.com/cockroachdb/docs/pull/1269.

jordanlewis commented 5 years ago

Hey Matt, this seems relevant to earlier discussion.

maddyblue commented 5 years ago

Oh nice. Yes I will take this.

asubiotto commented 4 years ago

Hey @RaduBerinde, this seems optimizer-related. Does it make sense to keep open?

RaduBerinde commented 4 years ago

Yeah, this is still true. These two statements are equivalent but have different plans:

> create table abc (a int primary key, b int, c int);
> explain upsert into abc values (1,1,1),(2,2,2);
          tree          |    field    |    description
------------------------+-------------+--------------------
                        | distributed | false
                        | vectorized  | false
  count                 |             |
   └── upsert           |             |
        │               | into        | abc(a, b, c)
        │               | strategy    | opt upserter
        │               | auto commit |
        └── render      |             |
             └── values |             |
                        | size        | 3 columns, 2 rows
(10 rows)

Time: 488.44µs

> explain insert into abc values (1,1,1),(2,2,2) on conflict(a) do update set (b,c)=(excluded.b,excluded.c);
             tree            |       field        |    description
-----------------------------+--------------------+--------------------
                             | distributed        | false
                             | vectorized         | false
  count                      |                    |
   └── upsert                |                    |
        │                    | into               | abc(a, b, c)
        │                    | strategy           | opt upserter
        │                    | auto commit        |
        └── render           |                    |
             └── hash-join   |                    |
                  │          | type               | left outer
                  │          | equality           | (column1) = (a)
                  │          | right cols are key |
                  ├── values |                    |
                  │          | size               | 3 columns, 2 rows
                  └── scan   |                    |
                             | table              | abc@primary
                             | spans              | ALL
mgartner commented 2 years ago

Supporting an INSERT ON CONFLICT DO NOTHING might make a good starter or intern project. The semantics are similar to UPSERT, but instead of updating the existing row when there is a conflict, INSERT ON CONFLICT DO NOTHING succeeds without performing any writes. To implement this I think we could do something like: CPut with a nil expected value, and if the key already existed swallow the error.

github-actions[bot] commented 1 year ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!

michae2 commented 11 months ago

Supporting an INSERT ON CONFLICT DO NOTHING might make a good starter or intern project. The semantics are similar to UPSERT, but instead of updating the existing row when there is a conflict, INSERT ON CONFLICT DO NOTHING succeeds without performing any writes. To implement this I think we could do something like: CPut with a nil expected value, and if the key already existed swallow the error.

This does seem like a great first part to bite off. Not promising, but we can at least consider this during 24.1 planning.

mgartner commented 11 months ago
marcus=# CREATE TABLE t (k INT PRIMARY KEY, a INT);
CREATE TABLE

marcus=# INSERT INTO t VALUES (1, 10);
INSERT 0 1

marcus=# INSERT INTO t VALUES (2, 20), (1, 11) ON CONFLICT DO NOTHING;
INSERT 0 1

marcus=# SELECT * FROM t;
 k | a
---+----
 1 | 10
 2 | 20
(2 rows)

I think my prior comment was a bit naive. This will be complicated for multi-row inserts on tables with multiple indexes because we'd have to keep track of which KVs correspond to which insert rows, and retroactively delete keys for a row if one of the keys failed to CPut. I'm uneasy about the complexity of that.

Maybe for starters we can limit to single-row inserts and/or tables with only a primary index.

michae2 commented 5 months ago

(See also https://github.com/cockroachdb/cockroach/issues/71153.)