Closed cockroach-teamcity closed 1 year ago
kv/kvserver.TestClosedTimestampCantServeWithConflictingIntent failed with artifacts on master @ 3b1bc872bb4bf8e5eb2ad8790ba8ec781167dd1f:
=== RUN TestClosedTimestampCantServeWithConflictingIntent
test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/33e1d369c27b9c01b2b6009c561815a3/logTestClosedTimestampCantServeWithConflictingIntent3629078817
test_log_scope.go:79: use -show-logs to present logs inline
closed_timestamp_test.go:337: request unexpectedly returned, should block; err: [NotLeaseHolderError] lease held by different store; r64: replica (n1,s1):1 not lease holder; current lease is repl=(n3,s3):2 seq=2 start=1682681331.525769256,0 exp=1682681337.524812851,0 pro=1682681331.524812851,0
panic.go:540: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/33e1d369c27b9c01b2b6009c561815a3/logTestClosedTimestampCantServeWithConflictingIntent3629078817
--- FAIL: TestClosedTimestampCantServeWithConflictingIntent (40.36s)
Parameters: TAGS=bazel,gss,race
See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)
- #100053 kv/kvserver: TestClosedTimestampCantServeWithConflictingIntent failed [C-test-failure O-robot branch-release-22.1]
kv/kvserver.TestClosedTimestampCantServeWithConflictingIntent failed with artifacts on master @ 4619767bb0b75ac85ddcb76e33c73211e369afed:
=== RUN TestClosedTimestampCantServeWithConflictingIntent
test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/33e1d369c27b9c01b2b6009c561815a3/logTestClosedTimestampCantServeWithConflictingIntent2101771649
test_log_scope.go:79: use -show-logs to present logs inline
closed_timestamp_test.go:351:
Error Trace: github.com/cockroachdb/cockroach/pkg/kv/kvserver_test/pkg/kv/kvserver/closed_timestamp_test.go:351
Error: Received unexpected error:
[NotLeaseHolderError] lease held by different store; r64: replica (n1,s1):1 not lease holder; current lease is repl=(n2,s2):3 seq=2 start=1682767595.564162603,0 exp=1682767601.563077486,0 pro=1682767595.563077486,0
Test: TestClosedTimestampCantServeWithConflictingIntent
panic.go:540: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/33e1d369c27b9c01b2b6009c561815a3/logTestClosedTimestampCantServeWithConflictingIntent2101771649
--- FAIL: TestClosedTimestampCantServeWithConflictingIntent (40.83s)
Parameters: TAGS=bazel,gss,race
See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)
- #100053 kv/kvserver: TestClosedTimestampCantServeWithConflictingIntent failed [C-test-failure O-robot branch-release-22.1]
kv/kvserver.TestClosedTimestampCantServeWithConflictingIntent failed with artifacts on master @ 69875da00208dd792cb92aee54811e8a660773a9:
=== RUN TestClosedTimestampCantServeWithConflictingIntent
test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/33e1d369c27b9c01b2b6009c561815a3/logTestClosedTimestampCantServeWithConflictingIntent464149897
test_log_scope.go:79: use -show-logs to present logs inline
closed_timestamp_test.go:351:
Error Trace: github.com/cockroachdb/cockroach/pkg/kv/kvserver_test/pkg/kv/kvserver/closed_timestamp_test.go:351
Error: Received unexpected error:
[NotLeaseHolderError] lease held by different store; r64: replica (n1,s1):1 not lease holder; current lease is repl=(n3,s3):3 seq=2 start=1684314268.194372668,0 exp=1684314274.193422745,0 pro=1684314268.193422745,0
Test: TestClosedTimestampCantServeWithConflictingIntent
panic.go:540: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/33e1d369c27b9c01b2b6009c561815a3/logTestClosedTimestampCantServeWithConflictingIntent464149897
--- FAIL: TestClosedTimestampCantServeWithConflictingIntent (34.98s)
Parameters: TAGS=bazel,gss,race
See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)
- #100053 kv/kvserver: TestClosedTimestampCantServeWithConflictingIntent failed [C-test-failure O-robot branch-release-22.1]
All of these failures use expiration-based leases under race, and probably aren't able to maintain the leases. Submitting a PR to extend the lease duration.
In https://teamcity.cockroachdb.com/viewLog.html?buildId=10875307&tab=buildResultsDiv&buildTypeId=Cockroach_Ci_Tests_Testrace on https://github.com/cockroachdb/cockroach/pull/106145
=== RUN TestClosedTimestampCantServeWithConflictingIntent
test_log_scope.go:167: test logs captured to: /artifacts/tmp/_tmp/33e1d369c27b9c01b2b6009c561815a3/logTestClosedTimestampCantServeWithConflictingIntent1610110799
test_log_scope.go:81: use -show-logs to present logs inline
closed_timestamp_test.go:342: request unexpectedly returned, should block; err: [NotLeaseHolderError] lease held by different store; r64: replica (n1,s1):1 not lease holder; current lease is repl=(n3,s3):3 seq=2 start=1689125159.587330435,0 exp=1689125219.586585395,0 pro=1689125159.586585395,0
panic.go:540: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/33e1d369c27b9c01b2b6009c561815a3/logTestClosedTimestampCantServeWithConflictingIntent1610110799
--- FAIL: TestClosedTimestampCantServeWithConflictingIntent (35.37s)
Note that exp-start
is 60s
, and the test still flakes.
@andrewbaptist Are you actively working on this? If not, should repl pick it up?
Repl can pick this up if you have time.
The error is here on the last line:
That is, we just^1 follower-read successfully at MinTimestamp.Prev()
on all replicas and are now expecting to block when we read at MinTimestamp
(since there's an intent, presumably). However, instead of that we get a NLHE.
I can imagine two reasons why that is: either the txn got aborted (however the test sets a long txn liveness threshold to avoid this^2) or there was some lease change that now made the follower read impossible in the first place (so it never got to bump into the intent).
Some notes on the logs^3 of this test: we spend >25s waiting for full replication. Then, pretty much exactly when the test also starts quiescing, we see a split
I230712 01:25:59.582913 1451694 kv/kvserver/replica_command.go:439 ⋮ [T1,n1,split,s1,r63/1:‹/Table/{62-106/1/0}›] 411 initiating a split of this range at key /Table/106 [r65] (‹span config›)
but note that our range is r64, so this is not anywhere our test reads (and even if it were, I don't know how it would be relevant).
I was unable to repro easily on my gceworker with
./dev test --race --filter TestLeaseTransferRejectedIfTargetNeedsSnapshot --stress ./pkg/kv/kvserver/ 2>&1 | tee stress.log
48 runs so far, 0 failures, over 5m40s
Ah nevermind this explains it
I230712 01:25:59.586438 1451847 13@kv/kvserver/replicate_queue.go:1031 ⋮ [T1,n1,replicate,s1,r64/1:‹/{Table/106/1/0-Max}›] 1104 transferring lease to s3
The test expects leases to be stable. But its using ReplicationAuto
so leases can be transferred - another case of https://github.com/cockroachdb/cockroach/issues/107524 though we can be less obviously incorrect in this test.
Note that https://github.com/cockroachdb/cockroach/issues/107179 also uses the helper that sets up this cluster, so it too might see errant lease transfers, perhaps they are sharing a root cause.
kv/kvserver.TestClosedTimestampCantServeWithConflictingIntent failed with artifacts on master @ a4ab6b6243f509e62993b4d81855f23470d9901a:
Parameters:
TAGS=bazel,gss,race
Help
See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)
Same failure on other branches
- #100053 kv/kvserver: TestClosedTimestampCantServeWithConflictingIntent failed [C-test-failure O-robot branch-release-22.1]
/cc @cockroachdb/replication
This test on roachdash | Improve this report!
Jira issue: CRDB-27142