Open Nican opened 1 year ago
Hello, I am Blathers. I am here to help you get the issue triaged.
Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.
I have CC'd a few people who may be able to assist you:
If we have not gotten back to your issue within a few business days, you can try the following:
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.
Thanks for the report! Have you tried with v22.2.0?
@rafiss Thanks for the response. I should have tried that.
I tried with v22.2.0
, v22.2.0-rc.1
, v22.2.0-beta.1
, v22.2.0-alpha.1
, and they all broken, and displays the 5s contention behavior.
v22.1.12
works as expected.
Hello @rafiss,
I was able to create a minimum viable repro of the issue. Note that each command needs to be run on separate connections. This is running (v22.2.1)
Create table:
-- drop table if exists dealers cascade;
CREATE TABLE dealers (
"organizationId" UUID NOT NULL,
"conventionId" INT8 NOT NULL,
id INT8 NOT NULL,
"userId" INT8 NULL,
CONSTRAINT dealers_pkey PRIMARY KEY ("organizationId" ASC, id ASC),
UNIQUE INDEX "dealers_userId_conventionId_key" ("userId" ASC, "conventionId" ASC),
UNIQUE INDEX "dealers_conventionId_key" ("conventionId" ASC, id ASC)
);
Connection 1 (10ms):
insert into dealers values ('00000000-0000-0000-0000-000000000000', 5427958, 1, 1);
Connection 2 (Takes 5100ms to run):
START TRANSACTION;
select * from dealers where "conventionId" = 5427958 and id = 1 for update;
COMMIT;
select * from dealers;
Oddly, removing the dealers_userId_conventionId_key
index from the table also fixes the issue. Also removing for update
also fixes the issue.
Please let me know if you are able to repro the issue.
I can't repro this on 22.2.1, what is the exact order/interleaving of the commands you're running on each connection?
Ok, I could reproduce it. The 2 connection thing is a red herring, you can repro on 1 connection like this:
root@127.0.0.1:26257/defaultdb> create table a (a int, b int, c int, primary key(a), unique index(b));
root@127.0.0.1:26257/defaultdb> insert into a values(1,2,3);
root@127.0.0.1:26257/defaultdb> select * from a where b = 2 for update;
a | b | c
----+---+----
1 | 2 | 3
(1 row)
Time: 3ms total (execution 3ms / network 0ms)
root@127.0.0.1:26257/defaultdb> select * from a where b = 2 for update;
a | b | c
----+---+----
1 | 2 | 3
(1 row)
Time: 4.542s total (execution 4.541s / network 0.001s)
Once you have the select for update in the shell, just hit up/enter quickly and you'll see the slow query.
I think this has to do with the FOR UPDATE
locking strength on the induced index join somehow, as without ensuring the plan has an index join the issue doesn't repro.
EXPLAIN ANALYZE output on the slow iteration looks like this:
root@127.0.0.1:26257/defaultdb> explain analyze select * from a where b = 2 for update;
info
------------------------------------------------------------------------------------
planning time: 614µs
execution time: 4.5s
distribution: local
vectorized: true
rows read from KV: 2 (23 B, 2 gRPC calls)
cumulative time spent in KV: 4.5s
cumulative time spent due to contention: 4.5s
maximum memory usage: 60 KiB
network usage: 0 B (0 messages)
regions: us-east1
• index join
│ nodes: n1
│ regions: us-east1
│ actual row count: 1
│ KV time: 1ms
│ KV contention time: 0µs
│ KV rows read: 1
│ KV bytes read: 13 B
│ KV gRPC calls: 1
│ estimated max memory allocated: 30 KiB
│ estimated max sql temp disk usage: 0 B
│ estimated row count: 1
│ table: a@a_pkey
│ locking strength: for update
│
└── • scan
nodes: n1
regions: us-east1
actual row count: 1
KV time: 4.5s
KV contention time: 4.5s
KV rows read: 1
KV bytes read: 10 B
KV gRPC calls: 1
estimated max memory allocated: 20 KiB
estimated row count: 1 (100% of the table; stats collected 47 seconds ago)
table: a@a_b_key
spans: [/2 - /2]
locking strength: for update
(40 rows)
Note the 4.5 second "contention time".
Bisected to b72c10955f5fab3f4d069568169d09a1934cbe15 (#77878), which unfortunately doesn't tell us much. Is the streamer using APIs incorrectly, or is there bug a bug in the API that the streamer is using?
cc @yuzefovich
Sure enough, disabling the streamer cluster setting removes the problem:
demo@127.0.0.1:26257/defaultdb> SET CLUSTER SETTING sql.distsql.use_streamer.enabled = false;
demo@127.0.0.1:26257/defaultdb> select * from a where b = 2 for update;
a | b | c
----+---+----
1 | 2 | 3
(1 row)
Time: 3ms total (execution 3ms / network 0ms)
demo@127.0.0.1:26257/defaultdb> select * from a where b = 2 for update;
a | b | c
----+---+----
1 | 2 | 3
(1 row)
Time: 4ms total (execution 4ms / network 0ms)
EXPLAIN ANALYZE:
demo@127.0.0.1:26257/defaultdb> explain analyze select * from a where b = 2 for update;
info
----------------------------------------------------------------------------------
planning time: 251µs
execution time: 1ms
distribution: local
vectorized: true
rows read from KV: 2 (27 B, 2 gRPC calls)
cumulative time spent in KV: 1ms
maximum memory usage: 50 KiB
network usage: 0 B (0 messages)
estimated RUs consumed: 0
• index join
│ nodes: n1
│ actual row count: 1
│ KV time: 391µs
│ KV contention time: 0µs
│ KV rows read: 1
│ KV bytes read: 15 B
│ KV gRPC calls: 1
│ estimated max memory allocated: 20 KiB
│ estimated max sql temp disk usage: 0 B
│ estimated row count: 1
│ table: a@a_pkey
│ locking strength: for update
│
└── • scan
nodes: n1
actual row count: 1
KV time: 738µs
KV contention time: 0µs
KV rows read: 1
KV bytes read: 12 B
KV gRPC calls: 1
estimated max memory allocated: 20 KiB
estimated row count: 1 (100% of the table; stats collected 1 minute ago)
table: a@a_b_key
spans: [/2 - /2]
locking strength: for update
(37 rows)
Time: 3ms total (execution 3ms / network 0ms)
I think the difference here is that the streamer is using LeafTxn
(since there might be concurrency going on) whereas the non-streamer code path uses RootTxn
. (I confirmed this by hacking the code for the streamer to also use the root, and the problem disappeared.)
The behavior is such that it seems that the locks with the leaf txn are only released after about 4.5s have passed after the txn that acquired them was committed. Maybe this was already mentioned elsewhere, but the problematic stmt is not the second statement that blocks (which can actually use either the streamer or the non-streamer code path), but the first stmt that used the streamer.
I think we need KV expertise here. Perhaps @nvanbenschoten can share some wisdom.
@jordanlewis what's your take on the severity of this issue? Should we block 22.2.2 release? Should we consider disabling the streamer in 22.2.2? I'm tentatively adding the corresponding release blocker labels.
~I thought the API contract was that a LeafTxn cannot perform writes, including placing locks? That's what i remember from the txn / txncoordsender tech note.~
Nvm, it can place lock spans. But they need to be "imported" into the RootTxn at the end.
But they need to be "imported" into the RootTxn at the end.
That should happen during metadata draining here where GetLeafTxnFinalState
is called by the index join and then here when UpdateRootWithLeafFinalState
is called in the DistSQLReceiver
, right? But the result of GetLeafTxnFinalState
has no lock spans when I try the example.
GetLeafTxnFinalState
is only about the refresh spans (at least right now). Perhaps there is a gap when FOR UPDATE was implemented - namely, it was assumed that we'd always be using the root txn (since FOR UPDATE implies that the txn will be performing writes which would prohibit the usage of the leaf txn), but this assumption is not verified, and the usage of the streamer breaks that assumption.
Probably the best quick fix is to examine the plan to see whether non-default key-locking is used and to not use the streamer if so. I'll work on that today.
Is it not simpler to extend GetLeafTxnFinalState to also include the lock spans if there's any?
Thanks. Indeed, that seems simple enough.
maybe double check that UpdateRootWithLeafFinalState merges the lock spans too
@Nican thanks for filing the issue, and @jordanlewis thanks for determining the concise reproduction. We have merged the fixes to master and 22.2 branches, and it should be included in the 22.2.2 release.
This issue now becomes about figuring out how to lift the restriction introduced in #94399. In particular, we either need to propagate the lock spans across leaf txns or re-implement how we do row-level locking. I'll update the issue description accordingly.
With #94399 we no longer choose to use the leaf txn (which would allow us to use the streamer API and parallelize local scans in some cases) if we find non-default key locking strength. This stems from the fact that we don't propagate lock spans for leaf txns, and even if we did, we'd run into another complication (quoting Nathan):
We should figure out how to lift this restriction.
Original issue description
*Edit from @jordanlewis: See https://github.com/cockroachdb/cockroach/issues/94290#issuecomment-1366343646 for a trivial repro* *To work around this problem in 22.2.0 and 22.2.1, run the following* ``` SET CLUSTER SETTING sql.distsql.use_streamer.enabled = false; ``` ---- **Describe the problem** It looks like `v22.2.1` has some regression with `FOR UPDATE`. It looks like that a lock is being held on the row for 5 seconds after the transaction finishes. Several of my e2e tests are now failing due to timeout. I drilled into one the tests. Doing explain analyze on the select query after the transaction finishes shows `cumulative time spent in KV: 4.9s` and `cumulative time spent due to contention: 4.9s`. Find attached also a debug zip from explain ANALYZE(debug) from the query. Removing `for update` fixes the issue, OR reverting to `v22.1.8` fixes the issue. **To Reproduce** The test on my codebase fails pretty consistently. From what I can tell, the sequence of events: 1. Transaction starts. 2. Transaction runs `select * from tbl where id = 1 for update` 3. The row in `tbl` is NOT updated inside of the transaction, but other data in unrelated tables is updated/inserted. 4. Transaction completes successfully. 5. (only run after transaction is complete) `select * from tbl` now takes 5 seconds to run. **Expected behavior** Running `SELECT * FROM tbl;` when there is no other workload on the database to take a few milliseconds to run. **Additional data / screenshots** [stmt-bundle-825387661464305665.zip](https://github.com/cockroachdb/cockroach/files/10300490/stmt-bundle-825387661464305665.zip) **Environment:** - CockroachDB version v22.2.1 - Server OS: Linux on docker - Client app: Sequelize **Additional context** E2E (including clicking on buttons) Tests are failing due to timeout.Jira issue: CRDB-22800