txnwait: confusing behavior around priorities

cockroach-teamcity commented 5 years ago

SHA: https://github.com/cockroachdb/cockroach/commits/0dba537ae88e495ddf29b4c347b4c30ee99bd046

Parameters:

TAGS=
GOFLAGS=

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=TestPushTxnHeartbeatTimeout PKG=github.com/cockroachdb/cockroach/pkg/storage TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=985354&tab=buildLog

=== RUN   TestPushTxnHeartbeatTimeout
I181025 05:54:46.763449 24718 storage/store.go:1547  [s1] [n1,s1]: failed initial metrics computation: [n1,s1]: system config not yet available
--- FAIL: TestPushTxnHeartbeatTimeout (0.02s)
    replica_test.go:5613: 9: expSuccess=false; got pErr <nil>, args={RequestHeader:{Key:"key-9" EndKey:/Min Sequence:0} PusherTxn:"pusher" id=e7955490 key="key-9" rw=false pri=0.00733603 iso=SERIALIZABLE stat=PENDING epo=0 ts=0.000000123,124 orig=0.000000123,124 max=0.000000124,124 wto=false rop=false seq=0 PusheeTxn:{ID:bba4a87f-571b-4a8a-80b8-9d555b6b37d4 Isolation:SERIALIZABLE Key:[107 101 121 45 57] Epoch:0 Timestamp:0.000000123,123 Priority:0 Sequence:1 DeprecatedBatchIndex:0} PushTo:2.000000123,123 Now:2.000000123,123 PushType:PUSH_TIMESTAMP Force:false}, reply=header:<> pushee_txn:<meta:<id:<bba4a87f-571b-4a8a-80b8-9d555b6b37d4> key:"key-9" timestamp:<wall_time:2000000123 logical:124 > sequence:1 > name:"test-9" last_heartbeat:<wall_time:123 logical:127 > orig_timestamp:<wall_time:123 logical:123 > max_timestamp:<wall_time:124 logical:123 > refreshed_timestamp:<> epoch_zero_timestamp:<> >

Jira issue: CRDB-4777

tbg commented 5 years ago

^- release-2.1

petermattis commented 5 years ago

I'm unable to reproduce on master using make stress PKG=./storage/ TESTS=TestPushTxnHeartbeatTimeout. Trying release-2.1 now.

petermattis commented 5 years ago

Reproduced in a little over a minute on release-2.1:

~ make stress PKG=./storage/ TESTS=TestPushTxnHeartbeatTimeout
...
--- FAIL: TestPushTxnHeartbeatTimeout (0.02s)
    replica_test.go:5613: 4: expSuccess=false; got pErr <nil>, args={RequestHeader:{Key:"key-4" EndKey:/Min Sequence:0} PusherTxn:"pusher" id=1e0eef74 key="key-4" rw=false pri=0.01352606 iso=SERIALIZABLE stat=PENDING epo=0 ts=0.000000123,74 orig=0.000000123,74 max=0.000000124,74 wto=false rop=false seq=0 PusheeTxn:{ID:a364f77e-d29f-4b0a-92de-aca445e76d0b Isolation:SERIALIZABLE Key:[107 101 121 45 52] Epoch:0 Timestamp:0.000000123,73 Priority:0 Sequence:1 DeprecatedBatchIndex:0} PushTo:1.000000123,73 Now:1.000000123,73 PushType:PUSH_ABORT Force:false}, reply=header:<> pushee_txn:<meta:<id:<a364f77e-d29f-4b0a-92de-aca445e76d0b> key:"key-4" timestamp:<wall_time:123 logical:77 > sequence:1 > name:"test-4" status:ABORTED last_heartbeat:<wall_time:123 logical:77 > orig_timestamp:<wall_time:123 logical:73 > max_timestamp:<wall_time:124 logical:73 > refreshed_timestamp:<> epoch_zero_timestamp:<> >
FAIL

ERROR: exit status 1

4205 runs completed, 1 failures, over 1m16s

I'm not terribly familiar with this code. Seems a bit worrisome for this to be failing. @andreimatei and @tschottdorf can you also take a look?

petermattis commented 5 years ago

Took another crack at reproducing this on master this morning and had success:

--- FAIL: TestPushTxnHeartbeatTimeout (0.02s)
    replica_test.go:5711: 7: expSuccess=false; got pErr <nil>, args={RequestHeader:{Key:"key-7" EndKey:/Min Sequence:0} PusherTxn:"pusher" id=846652a6 key="key-7" rw=false pri=0.01264969 iso=SERIALIZABLE stat=PENDING epo=0 ts=0.000000123,108 orig=0.000000123,108 max=0.000000124,108 wto=false rop=false seq=0 PusheeTxn:{ID:091630d6-1485-4166-88c0-8c4b2fe8e545 Isolation:SERIALIZABLE Key:[107 101 121 45 55] Epoch:0 Timestamp:0.000000123,107 Priority:0 Sequence:1 DeprecatedBatchIndex:0} PushTo:2.000000122,107 Now:2.000000122,107 PushType:PUSH_ABORT Force:false}, reply=header:<> pushee_txn:<meta:<id:<091630d6-1485-4166-88c0-8c4b2fe8e545> key:"key-7" timestamp:<wall_time:123 logical:111 > sequence:1 > name:"test-7" status:ABORTED last_heartbeat:<wall_time:123 logical:111 > orig_timestamp:<wall_time:123 logical:107 > max_timestamp:<wall_time:124 logical:107 > refreshed_timestamp:<> epoch_zero_timestamp:<> >
FAIL

So whatever is going on, it wasn't fixed on master recently.

tbg commented 5 years ago

Taking a look.

tbg commented 5 years ago

Looks like the code below does it. We're probably setting up priorities that have some randomness in them.

    case CanPushWithPriority(&args.PusherTxn, &reply.PusheeTxn):
        reason = "pusher has priority"
        pusherWins = true

tbg commented 5 years ago

Reopening to investigate based on @andreimatei's confusion in https://github.com/cockroachdb/cockroach/pull/31965.

tbg commented 5 years ago

Following up on @andreimatei's https://github.com/cockroachdb/cockroach/pull/31965#issuecomment-433992581:

MakePriority(1) returns a randomized priority. It's only when you specify {Min,Max}UserPriority that you get "fixed" ones.

This seems wrong, but it also seems that that's what you get in the SQL layer:

https://github.com/cockroachdb/cockroach/blob/552aabd034f595f2b54cc2c1826fc2fb1b3fcaaf/pkg/sql/conn_executor.go#L1716-L1731

What saves us, I think, is this code:

https://github.com/cockroachdb/cockroach/blob/c32a3790c283ec8401d0a776ba0d141cc3d56dab/pkg/storage/txnwait/txnqueue.go#L43-L56

We're going to go into the queue unless the priority is either of {Min,Max}UserPriority. But if there's nothing in the queue, we'll go to push immediately:

https://github.com/cockroachdb/cockroach/blob/c32a3790c283ec8401d0a776ba0d141cc3d56dab/pkg/storage/txnwait/txnqueue.go#L387-L393

This smells. Say you have txns txn1 and txn2 and txn2 has higher priority (but they're both "equally good" sql transactions. If txn2 runs into an intent of txn2, it'll just push it and abort txn1. But if some other txn txn3 with a priority lower than txn1 tries to push txn1 first, it'll get a failure and put txn1 in the txnWaitQueue and now if txn2 comes along it'll enqueue and wait for txn1 to complete. This behavior seems pretty unsubstantiated.

I wonder if my reading of this is just wrong (but I spent ~20min looking at it and am fairly sure). I also noticed that the testing doesn't exercise this at all because it enqueues the pushee manually before sending the pusher along. If it didn't do that, it'd also flake left and right because it also uses a user priority of 1.

cc @nvanbenschoten since this also has potential performance implications as I expect it to cause more spurious aborts than necessary.

nvanbenschoten commented 5 years ago

If txn2 runs into an intent of txn1, it'll just push it and abort txn1.

That's not my understanding. Unless txn1 has MinPriority or txn2 has MaxPriority, ShouldPushImmediately will return false and txn1 will be put in the txnWaitQueue. If the txn is not already in the txnWaitQueue then the PushTxnRequest will evaluate. However, what we're missing is CanPushWithPriority:

https://github.com/cockroachdb/cockroach/blob/7cb791ca9dca653f2cac898cf6296d8f9ee7e654/pkg/storage/batcheval/transaction.go#L88-L93

This logic ensures that evaluating the PushTxnRequest result in a TransactionPushError, which will result in txn1 being put in the txnWaitQueue.

tbg commented 5 years ago

Thanks for clarifying, @nvanbenschoten. This seems awfully roundabout, but at least it's working "as intended". Is there any particular reason we don't just assign some fixed number and make sure that it doesn't change? We're not using priorities for anything but breaking the deadlock (where we could also use anything else), do we?

nvanbenschoten commented 5 years ago

Is there any particular reason we don't just assign some fixed number and make sure that it doesn't change

I've wondered this as well. I don't know the history well enough to give an answer. Perhaps @andreimatei does? Seems like we could just as easily use txn orig timestamps to break deadlocks.

andreimatei commented 5 years ago

What I know about the history of the priorities is that they used to matter, and Radu put some fancy math into generating them, and then they stopped mattering and now we have more code than we need. Separately, one thing that I always saw as a wart was the fact that we have roachpb.UserPriority and also lower level priorities (naked ints). And various APIs convert between the two. And I think there was always chaos because some levels of the code that were dealing with restarts only new the naked int and not the original UserPriority although they would have liked to know it (specifically when dealing with restarts, I think), and I tried to make things a bit saner in various refactorings but perhaps I've made it worse. I definitely think a cleanup in this area is in order.

github-actions[bot] commented 3 years ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 5 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!

cockroachdb / cockroach

txnwait: confusing behavior around priorities #31859