cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.11k stars 3.81k forks source link

kv/kvclient/rangefeed: TestDBClientScan failed #129314

Closed cockroach-teamcity closed 1 month ago

cockroach-teamcity commented 2 months ago

kv/kvclient/rangefeed.TestDBClientScan failed with artifacts on release-23.2 @ b5a9e987af5f8b8eabbaa31db2b19ec270099d0f:

=== RUN   TestDBClientScan
    test_log_scope.go:170: test logs captured to: /artifacts/tmp/_tmp/8a396ce5ffb24698ad44a501ceca79c0/logTestDBClientScan1383858133
    test_log_scope.go:81: use -show-logs to present logs inline
    test_server_shim.go:159: automatically injected a shared process virtual cluster under test; see comment at top of test_server_shim.go for details.
    db_adapter_external_test.go:301: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/8a396ce5ffb24698ad44a501ceca79c0/logTestDBClientScan1383858133
--- FAIL: TestDBClientScan (62.69s)
=== RUN   TestDBClientScan/parallel_scan_requests
    db_adapter_external_test.go:170: range cache has 4 ranges: [r68:/Tenant/10{-/Table/104/1/250} [(n1,s1):1, next=2, gen=4, sticky=1724155585.614816184,0] r71:/Tenant/10/Table/104/1/{250-500} [(n1,s1):1, next=2, gen=5, sticky=9223372036.854775807,2147483647] r72:/Tenant/10/Table/104/1/{500-750} [(n1,s1):1, next=2, gen=6, sticky=9223372036.854775807,2147483647] r73:/Tenant/10/Table/{104/1/750-Max} [(n1,s1):1, next=2, gen=6, sticky=9223372036.854775807,2147483647]]
    db_adapter_external_test.go:189: completed scan for /Tenant/10/Table/104/{1-2}
    db_adapter_external_test.go:197: condition failed to evaluate within 45s: from db_adapter_external_test.go:199: still waiting for barrier (1/3)
    --- FAIL: TestDBClientScan/parallel_scan_requests (46.44s)

Parameters:

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/replication

This test on roachdash | Improve this report!

Jira issue: CRDB-41671

andrewbaptist commented 2 months ago

This doesn't appear to be a release blocker. The test assumes that there will be at least 3 scan requests for the range because there are 4 ranges (0-250, 250-500, 500-750 and 750+) - in the test it ends up doing a single scan which covers all 4 ranges.

This doesn't seem like incorrect behavior, but it is unexpected since we had 4 ranges and 3 workers, so it is inefficient. I will spend some time later this week to try and bisect to see if anything that changed recently made this more flakey.

andrewbaptist commented 2 months ago

This has happened before and an attempt to fix it was done in https://github.com/cockroachdb/cockroach/pull/118881, but this seems insufficient still. This is a rare flake, but should be addressed eventually.

github-actions[bot] commented 1 month ago

We have marked this test failure issue as stale because it has been inactive for 1 month. If this failure is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 5 days to keep the test failure queue tidy.