Closed cockroach-teamcity closed 2 years ago
ccl/multiregionccl.TestMultiRegionDataDriven failed with artifacts on master @ 912964e02ddd951c77d4f71981ae18b3894e9084:
=== RUN TestMultiRegionDataDriven
test_log_scope.go:79: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestMultiRegionDataDriven3462472544
test_log_scope.go:80: use -show-logs to present logs inline
=== CONT TestMultiRegionDataDriven
datadriven_test.go:390: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestMultiRegionDataDriven3462472544
--- FAIL: TestMultiRegionDataDriven (97.05s)
=== RUN TestMultiRegionDataDriven/regional_by_table
datadriven_test.go:122:
testdata/regional_by_table:101: SELECT * FROM db.rbt WHERE k = 1
expected:
served locally: true
served via follower read: false
found:
served locally: false
--- FAIL: TestMultiRegionDataDriven/regional_by_table (11.06s)
See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM) Parameters in this failure: - GOFLAGS=-parallel=4
- #73829 ccl/multiregionccl: TestMultiRegionDataDriven failed [C-test-failure O-robot branch-release-21.2]
ccl/multiregionccl.TestMultiRegionDataDriven failed with artifacts on master @ 9b0065ca2450205a34d4237be7b317c2d895658c:
=== RUN TestMultiRegionDataDriven
test_log_scope.go:79: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestMultiRegionDataDriven2898876028
test_log_scope.go:80: use -show-logs to present logs inline
=== CONT TestMultiRegionDataDriven
datadriven_test.go:390: -- test log scope end --
test logs left over in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestMultiRegionDataDriven2898876028
--- FAIL: TestMultiRegionDataDriven (41.73s)
=== RUN TestMultiRegionDataDriven/regional_by_table
datadriven_test.go:122:
testdata/regional_by_table:154: SELECT * FROM db.rbt AS OF SYSTEM TIME follower_read_timestamp() WHERE k = 1
expected:
served locally: false
found:
served locally: true
served via follower read: true
--- FAIL: TestMultiRegionDataDriven/regional_by_table (12.23s)
See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM) Parameters in this failure: - GOFLAGS=-parallel=4
- #73829 ccl/multiregionccl: TestMultiRegionDataDriven failed [C-test-failure O-robot branch-release-21.2]
ccl/multiregionccl.TestMultiRegionDataDriven failed with artifacts on master @ e1068d77afbd39b162978281c9da7cbea49c1c3a:
../../util/stop/stopper.go:327 stop.(*Stopper).RunTask { f(ctx) }
../../server/node.go:863 server.(*Node).writeNodeStatus { err = n.recorder.WriteNodeStatus(ctx, n.storeCfg.DB, *nodeStatus, mustExist) }
../../server/node.go:793 server.(*Node).startWriteNodeStatus { // node decommissioning, so we have to error out if we can't create it. }
../../server/server.go:1802 server.(*Server).PreStart { // Begin recording status summaries. }
../../server/server.go:1253 server.(*Server).Start { func (s *Server) Start(ctx context.Context) error { }
../../server/testserver.go:499 server.(*TestServer).Start { func (ts *TestServer) Start(ctx context.Context) error { }
../../testutils/testcluster/testcluster.go:485 testcluster.(*TestCluster).startServer { server := tc.Servers[idx] }
../../testutils/testcluster/testcluster.go:315 testcluster.(*TestCluster).Start { } else { }
../../testutils/testcluster/testcluster.go:185 testcluster.StartTestCluster { cluster.Start(t) }
datadriven_test.go:164 multiregionccl_test.TestMultiRegionDataDriven.func1.1 { }) }
../../../vendor/github.com/cockroachdb/datadriven/datadriven.go:321 datadriven.runDirective.func1 { actual := f(t, d) }
../../../vendor/github.com/cockroachdb/datadriven/datadriven.go:354 datadriven.runDirective { expectedLines := difflib.SplitLines(d.Expected) }
../../../vendor/github.com/cockroachdb/datadriven/datadriven.go:196 datadriven.runDirectiveOrSubTest { } }
../../../vendor/github.com/cockroachdb/datadriven/datadriven.go:166 datadriven.runTestInternal { r := newTestDataReader(t, sourceName, reader, rewrite) }
../../../vendor/github.com/cockroachdb/datadriven/datadriven.go:136 datadriven.RunTest { rewriteData := runTestInternal(t, path, file, f, *rewriteTestFiles) }
datadriven_test.go:388 multiregionccl_test.TestMultiRegionDataDriven.func1 { }) }
../../../vendor/github.com/cockroachdb/datadriven/datadriven.go:413 datadriven.Walk { f(t, path) }
../../../vendor/github.com/cockroachdb/datadriven/datadriven.go:426 datadriven.Walk.func1 { Walk(t, filepath.Join(path, file.Name()), f) }
goroutine 38921 lock 0xc0040ecca8
../../util/tracing/crdbspan.go:360 tracing.(*crdbSpan).getVerboseRecording { // grandchildren there are. } <<<<<
../../util/tracing/crdbspan.go:357 tracing.(*crdbSpan).getVerboseRecording { }
../../util/tracing/crdbspan.go:338 tracing.(*crdbSpan).getRecordingImpl { case RecordingVerbose: }
../../util/tracing/crdbspan.go:779 tracing.(*crdbSpan).childFinished { rec = child.GetRecording(RecordingVerbose, false /* finishing - the child is already finished */) }
../../util/tracing/crdbspan.go:321 tracing.(*crdbSpan).GetRecording { func (s *crdbSpan) GetRecording(recType RecordingType, finishing bool) Recording { }
../../util/tracing/crdbspan.go:242 tracing.(*crdbSpan).finish { parent.Span.i.crdb.childFinished(s) }
../../util/tracing/span_inner.go:91 tracing.(*spanInner).Finish { }
../../util/tracing/span_inner.go:226 tracing.(*spanInner).Tracer { func (s *spanInner) Tracer() *Tracer { }
../../util/tracing/span.go:246 tracing.(*Span).Finish { sp.finishInternal() }
../../kv/kvserver/replica_range_lease.go:480 kvserver.(*pendingLeaseRequest).requestLeaseAsync.func2 { p.cancelLocked() }
../../util/stop/stopper.go:494 stop.(*Stopper).RunAsyncTaskEx.func2 { f(ctx) }
goroutine 38921 lock 0xc0040edba8
../../util/tracing/crdbspan.go:726 tracing.(*crdbSpan).childFinished { s.mu.Lock() } <<<<<
../../util/tracing/crdbspan.go:725 tracing.(*crdbSpan).childFinished { func (s *crdbSpan) childFinished(child *crdbSpan) { }
../../util/tracing/crdbspan.go:242 tracing.(*crdbSpan).finish { parent.Span.i.crdb.childFinished(s) }
../../util/tracing/span_inner.go:91 tracing.(*spanInner).Finish { }
../../util/tracing/span_inner.go:226 tracing.(*spanInner).Tracer { func (s *spanInner) Tracer() *Tracer { }
../../util/tracing/span.go:246 tracing.(*Span).Finish { sp.finishInternal() }
../../kv/kvserver/replica_range_lease.go:480 kvserver.(*pendingLeaseRequest).requestLeaseAsync.func2 { p.cancelLocked() }
../../util/stop/stopper.go:494 stop.(*Stopper).RunAsyncTaskEx.func2 { f(ctx) }
ERROR: exit status 2
1 runs completed, 1 failures, over 1s
context canceled
See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM) Parameters in this failure: - TAGS=deadlock - GOFLAGS=-parallel=4
- #73829 ccl/multiregionccl: TestMultiRegionDataDriven failed [C-test-failure O-robot branch-release-21.2]
(Ignoring the last tracing failure that Andrei fixed)
These tests have been failing for a while, even before #73876 (which certainly couldn't have helped with its additional asynchrony). Here's one from 21.2: #73829. +cc #75543 (a bazel dup?). Looking at the test directives, we seem make use of arbitrary sleeps, perhaps that's why?
In #75304 we observed the need to sometimes to sometimes retry the query itself to accommodate stale distsender caches, our parallel here being this directive, so perhaps that's not as much of a problem here:
For the "wait-for-zone-config-changes" primitive, we can transfer the leaseholder by issuing a one-off transfer request, which I don't think guarantees that the lease would stay pinned to the destination. Likely a red-herring for these failures, but worth noting I think.
I'm not planning on looking at these tests further unless pushed. +cc @arulajmani / @ajstorm / @nvanbenschoten: should we skip these tests on master? Do we think we'll have the time to polish them up? While they do provide good coverage, I do think it'd take time to make them a bit more robust, and in relation to #73876 at least, I'm not expecting any work here catch bugs as much as ensure this test is flake-proof.
hasn't triggered for over a month
hasn't triggered for over a month
They have on release-22.1 which I guess Arul's currently assigned to (#77908). I suspect this will just fail again but happy connect the history back to this issue when it happens. This is a pretty long running test, 45s+ at times, if it's not flaking in CI it's also possible because it's just not getting stressed enough. I hope we can either rewrite this test to be faster or rip it out entirely.
ccl/multiregionccl.TestMultiRegionDataDriven failed with artifacts on master @ c3d71ac887844bef174abb6dab2a4e1ce9270ab7:
Help
See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM) Parameters in this failure: - GOFLAGS=-parallel=4
Same failure on other branches
- #73829 ccl/multiregionccl: TestMultiRegionDataDriven failed [C-test-failure O-robot branch-release-21.2]
/cc @cockroachdb/multiregion
This test on roachdash | Improve this report!
Jira issue: CRDB-12258