Open pcsegal opened 1 year ago
Hello, I am Blathers. I am here to help you get the issue triaged.
Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.
I have CC'd a few people who may be able to assist you:
If we have not gotten back to your issue within a few business days, you can try the following:
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.
Hi @pcsegal, thanks for the detailed reproduction steps! I can easily reproduce the issue you're describing.
I'm going to pass this over to the kv team, since looking at the trace for one of the queries taking over 300ms, it seems like all the time is spent "waiting to acquire write latch":
Here is the statement bundle I collected on my machine in case it's helpful. stmt-bundle-842282716587982849.zip
cc @michae2 in case this has changed with the read committed work
Describe the problem
select for update
withskip locked
performs slowly when there is a high number of concurrent queries.To Reproduce
To make the issue easier to reproduce, I am posting a Go example.
In this example, I am trying to model the following situation: there is a set of items, and a set of workers. Each worker can claim an item, and each item can be claimed by only one worker at a time. Once a worker claims an item, the claim expires after a fixed amount of time (e.g., 5 minutes). After the claim for a given item expires, other workers can claim that item.
The
items
table models the items. In order to claim an item, each worker aselect for update ... skip locked
query followed by anupdate
query, in one transaction.The sample code is below. It creates 1000 items and spins up 500 workers.
Observed behavior:
Running the above example, the output in my local machine was the following:
Additionally, the CockroachDB WebUI's "SQL Activity" tab shows the following:
select for update ... skip locked
query is 311.6 ms;update
query is 306.6 ms.Expected behavior:
I would have expected a lower mean latency, in the order of tens of milliseconds rather than hundreds of milliseconds.
Environment:
github.com/jackc/pgx/v4
v4.18.0 .Jira issue: CRDB-24536