cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.88k stars 3.77k forks source link

kv/kvserver/batcheval: TestDeleteRangeTombstoneSetsGCHint failed #128752

Open cockroach-teamcity opened 1 month ago

cockroach-teamcity commented 1 month ago

kv/kvserver/batcheval.TestDeleteRangeTombstoneSetsGCHint failed with artifacts on release-23.2.9-rc @ 579c0a000fb80ef7184e46ea038937c135ee2f0c:

=== RUN   TestDeleteRangeTombstoneSetsGCHint
    test_log_scope.go:170: test logs captured to: /artifacts/tmp/_tmp/e1ab32a24518a46d10d3b8d78552a36a/logTestDeleteRangeTombstoneSetsGCHint4138317912
    test_log_scope.go:81: use -show-logs to present logs inline
    test_server_shim.go:159: automatically injected an external process virtual cluster under test; see comment at top of test_server_shim.go for details.
    conditional_wrap.go:190: 
        pkg/kv/kvserver/batcheval_test/pkg/kv/kvserver/batcheval/cmd_delete_range_gchint_test.go:65: (TestDeleteRangeTombstoneSetsGCHint)
            NOTICE: .LookupRange() called via implicit interface StorageLayerInterface;
        HINT: consider using .StorageLayer().LookupRange() instead.
    cmd_delete_range_gchint_test.go:77: aborted in DistSender: result is ambiguous: node unavailable; try another peer
    panic.go:523: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/e1ab32a24518a46d10d3b8d78552a36a/logTestDeleteRangeTombstoneSetsGCHint4138317912
--- FAIL: TestDeleteRangeTombstoneSetsGCHint (12.54s)

Parameters:

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/kv

This test on roachdash | Improve this report!

Jira issue: CRDB-41181

pav-kv commented 1 month ago

Dup of #120982

pav-kv commented 1 month ago

This failure similarly logs a rangefeed error:

W240810 14:00:06.623283 6963931 kv/kvclient/rangefeed/rangefeed.go:327 ⋮ [T10,Vcluster-10,nsql1,rangefeed=‹lease›] 301  rangefeed failed 0 times, restarting: received unexpected rangefeed DeleteRange event with no OnDeleteRange handler: ‹delete_range:<span:<key:"\376\222\213" end_key:"\376\222\214" > timestamp:<wall_time:1723298406598742584 > > ›

Not sure if it's related. Some successful runs don't print this error, some do.

pav-kv commented 1 month ago
W240810 14:00:06.627423 7357769 kv/kvserver/intentresolver/intent_resolver.go:867 ⋮ [-] 319  failed to gc transaction record: could not GC completed transaction anchored at /Tenant/10/Table/15/1/‹993617530126565377›: node unavailable; try another peer

Similar errors show up in successful runs though, too.

pav-kv commented 1 month ago

The kv-distribution log is full of:

I240810 13:59:58.333557 6721494 13@kv/kvserver/replicate_queue.go:785 ⋮ [T1,Vsystem,n1,replicate,s1,r1/1:‹/{Min-System/NodeL…}›] 1  error processing replica: ‹0 of 1 live stores are able to take a new replica for the range (1 already has a voter, 0 already have a non-voter); likely not enough nodes in cluster›
github-actions[bot] commented 3 days ago

We have marked this test failure issue as stale because it has been inactive for 1 month. If this failure is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 5 days to keep the test failure queue tidy.