cockroachdb / cockroach

CockroachDB - the open source, cloud-native distributed SQL database.
https://www.cockroachlabs.com
Other
29.51k stars 3.7k forks source link

kv/kvserver: TestCrashWhileTruncatingSideloadedEntries failed #124845

Open cockroach-teamcity opened 1 month ago

cockroach-teamcity commented 1 month ago

kv/kvserver.TestCrashWhileTruncatingSideloadedEntries failed on master @ 99c0d277c26344a08903e1372ab62a034336d178:

                              | github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).RestartServer
                              |     github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1682
                              | github.com/cockroachdb/cockroach/pkg/kv/kvserver_test.TestCrashWhileTruncatingSideloadedEntries
                              |     github.com/cockroachdb/cockroach/pkg/kv/kvserver_test/pkg/kv/kvserver/client_raft_log_queue_test.go:354
                              | testing.tRunner
                              |     GOROOT/src/testing/testing.go:1689
                            Wraps: (4) L6: 000037: file 000037 (type 2) unknown to the objstorage provider: file does not exist
                                └─ Wraps: (5) attached stack trace
                              -- stack trace:
                              | github.com/cockroachdb/pebble.checkConsistency
                              |     github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/open.go:1288
                              | [...repeated from below...]
                                  └─ Wraps: (6) L6: 000037
                                    └─ Wraps: (7) attached stack trace
                              -- stack trace:
                              | github.com/cockroachdb/pebble/objstorage/objstorageprovider.(*provider).Lookup
                              |     github.com/cockroachdb/pebble/objstorage/objstorageprovider/external/com_github_cockroachdb_pebble/objstorage/objstorageprovider/provider.go:405
                              | github.com/cockroachdb/pebble.checkConsistency
                              |     github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/open.go:1279
                              | github.com/cockroachdb/pebble.Open
                              |     github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/open.go:393
                              | github.com/cockroachdb/cockroach/pkg/storage.newPebble
                              |     github.com/cockroachdb/cockroach/pkg/storage/pebble.go:1330
                              | github.com/cockroachdb/cockroach/pkg/storage.Open
                              |     github.com/cockroachdb/cockroach/pkg/storage/open.go:527
                              | github.com/cockroachdb/cockroach/pkg/server.(*Config).CreateEngines
                              |     github.com/cockroachdb/cockroach/pkg/server/config.go:865
                              | github.com/cockroachdb/cockroach/pkg/server.NewServer
                              |     github.com/cockroachdb/cockroach/pkg/server/server.go:297
                              | github.com/cockroachdb/cockroach/pkg/server.testServerFactoryImpl.New
                              |     github.com/cockroachdb/cockroach/pkg/server/testserver.go:2455
                              | github.com/cockroachdb/cockroach/pkg/testutils/serverutils.NewServer
                              |     github.com/cockroachdb/cockroach/pkg/testutils/serverutils/test_server_shim.go:327
                              | github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).RestartServerWithInspect
                              |     github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1738
                              | github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).RestartServer
                              |     github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1682
                              | github.com/cockroachdb/cockroach/pkg/kv/kvserver_test.TestCrashWhileTruncatingSideloadedEntries
                              |     github.com/cockroachdb/cockroach/pkg/kv/kvserver_test/pkg/kv/kvserver/client_raft_log_queue_test.go:354
                              | testing.tRunner
                              |     GOROOT/src/testing/testing.go:1689
                              | runtime.goexit
                              |     src/runtime/asm_amd64.s:1695
                                      └─ Wraps: (8) file 000037 (type 2) unknown to the objstorage provider
                                        └─ Wraps: (9) file does not exist
                            Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *join.joinError (5) *withstack.withStack (6) *errutil.withPrefix (7) *withstack.withStack (8) *errutil.withPrefix (9) *errors.errorString
            Test:           TestCrashWhileTruncatingSideloadedEntries
    panic.go:626: -- test log scope end --
test logs left over in: outputs.zip/logTestCrashWhileTruncatingSideloadedEntries2090345312
--- FAIL: TestCrashWhileTruncatingSideloadedEntries (0.91s)

Parameters:

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

- #124468 kv/kvserver: TestCrashWhileTruncatingSideloadedEntries failed [C-test-failure O-robot T-kv-replication branch-release-24.1.0-rc release-blocker]

/cc @cockroachdb/replication

This test on roachdash | Improve this report!

Jira issue: CRDB-39104

miraradeva commented 1 month ago

Duplicate of https://github.com/cockroachdb/cockroach/issues/124468.

jbowens commented 3 weeks ago

In this instance, failed to find L6 file 000037:

 client_raft_log_queue_test.go:354: 
            Error Trace:    github.com/cockroachdb/cockroach/pkg/kv/kvserver_test/pkg/kv/kvserver/client_raft_log_queue_test.go:354
            Error:          Received unexpected error:
                            failed to create engines: L6: 000037: file 000037 (type 2) unknown to the objstorage provider: file does not exist
                            (1) attached stack trace
                              -- stack trace:
                              | github.com/cockroachdb/cockroach/pkg/server.NewServer
                              |     github.com/cockroachdb/cockroach/pkg/server/server.go:299
                              | github.com/cockroachdb/cockroach/pkg/server.testServerFactoryImpl.New
                              |     github.com/cockroachdb/cockroach/pkg/server/testserver.go:2455
                              | github.com/cockroachdb/cockroach/pkg/testutils/serverutils.NewServer
                              |     github.com/cockroachdb/cockroach/pkg/testutils/serverutils/test_server_shim.go:327
                              | github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).RestartServerWithInspect
                              |     github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1738
                              | github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).RestartServer
                              |     github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1682
                              | github.com/cockroachdb/cockroach/pkg/kv/kvserver_test.TestCrashWhileTruncatingSideloadedEntries
                              |     github.com/cockroachdb/cockroach/pkg/kv/kvserver_test/pkg/kv/kvserver/client_raft_log_queue_test.go:354
                              | testing.tRunner
                              |     GOROOT/src/testing/testing.go:1689

Post-restart, the directory looks like:

    client_raft_log_queue_test.go:353: FS after restart:
                  /
             832    000002.log
            1191    000004.sst
            1500    000005.sst
             958    000006.sst
             934    000007.sst
            1035    000008.sst
              52    000010.log
           10815    000011.log
            1197    000012.sst
             849    000015.sst
             882    000016.sst
             882    000017.sst
             882    000018.sst
             882    000019.sst
             882    000020.sst
             882    000021.sst
             882    000022.sst
             882    000023.sst
             882    000024.sst
             882    000025.sst
             882    000026.sst
             882    000027.sst
             882    000028.sst
             882    000029.sst
             882    000030.sst
             882    000031.sst
             882    000032.sst
             882    000033.sst
             882    000034.sst
             882    000035.sst
               0    LOCK
            2234    MANIFEST-000001
            2590    OPTIONS-000003
              16    REMOTE-OBJ-CATALOG-000001
              10    STORAGE_MIN_VERSION
                    auxiliary/
                      sideloading/
                        r0XXXX/
                          r69/
             882            i41.t6
             882            i42.t6
                      sstsnapshot/
               0    marker.format-version.000004.017
               0    marker.manifest.000001.MANIFEST-000001
               0    marker.remote-obj-catalog.000001.REMOTE-OBJ-CATALOG-000001

And 000037 is created by a compaction to L6:

I240530 08:14:08.900099 5336 3@pebble/event.go:816 ⋮ [n2,s2,pebble] 411  [JOB 27] ingested L0:000032 (882B)
I240530 08:14:08.900226 5336 3@pebble/event.go:808 ⋮ [n2,s2,pebble] 412  [JOB 28] ingesting: sstable created 000033
I240530 08:14:08.900321 5336 3@pebble/event.go:816 ⋮ [n2,s2,pebble] 413  [JOB 28] ingested L0:000033 (882B)
I240530 08:14:08.900470 5336 3@pebble/event.go:808 ⋮ [n2,s2,pebble] 414  [JOB 29] ingesting: sstable created 000034
I240530 08:14:08.900599 5336 3@pebble/event.go:816 ⋮ [n2,s2,pebble] 415  [JOB 29] ingested L0:000034 (882B)
I240530 08:14:08.900729 5336 3@pebble/event.go:808 ⋮ [n2,s2,pebble] 416  [JOB 30] ingesting: sstable created 000035
I240530 08:14:08.900820 5336 3@pebble/event.go:816 ⋮ [n2,s2,pebble] 417  [JOB 30] ingested L0:000035 (882B)
I240530 08:14:08.901069 7538 3@pebble/event.go:768 ⋮ [n2,s2,pebble] 418  [JOB 31] compacting(default) L0 [000016] (882B) Score=100.00 + L6 [000008] (1.0KB) Score=0.00; OverlappingRatio: Single 1.17, Multi 0.00
I240530 08:14:08.901170 7539 3@pebble/event.go:768 ⋮ [n2,s2,pebble] 419  [JOB 32] compacting(elision-only) L6 [000004] (1.2KB) Score=0.00 + L6 [] (0B) Score=0.00; OverlappingRatio: Single 0.00, Multi 0.00
I240530 08:14:08.901273 7538 3@pebble/event.go:808 ⋮ [n2,s2,pebble] 420  [JOB 31] compacting: sstable created 000036
I240530 08:14:08.901322 7539 3@pebble/event.go:808 ⋮ [n2,s2,pebble] 421  [JOB 32] compacting: sstable created 000037
I240530 08:14:08.901489 7539 3@pebble/event.go:772 ⋮ [n2,s2,pebble] 422  [JOB 32] compacted(elision-only) L6 [000004] (1.2KB) Score=0.00 + L6 [] (0B) Score=0.00 -> L6 [000037] (972B), in 0.0s (0.0s total), output rate 4.4MB/s
I240530 08:14:08.901590 7526 3@pebble/event.go:768 ⋮ [n2,s2,pebble] 423  [JOB 33] compacting(elision-only) L6 [000005] (1.5KB) Score=0.00 + L6 [] (0B) Score=0.00; OverlappingRatio: Single 0.00, Multi 0.00
nicktrav commented 2 days ago

Removing release-blocker as we are confident this is an issue with the test.

pav-kv commented 2 days ago

@nicktrav Sorry, there might have been a confusion with another TestCrashWhileTruncatingSideloadedEntries failure. It's #117785 that's a test-only issue, but this one is different and seemed like a storage bug.

jbowens commented 2 days ago

@nicktrav Sorry, there might have been a confusion with another TestCrashWhileTruncatingSideloadedEntries failure. It's https://github.com/cockroachdb/cockroach/issues/117785 that's a test-only issue, but this one is different and seemed like a storage bug.

No, we're pretty confident there's no durability issue within Pebble here. It's possible we're missing something, but our tests in Pebble are extensive here, and we've code audited the very few places where this type of durability bug could exist. It's most likely a subtle bug in the test itself.

pav-kv commented 2 days ago

I see, thanks for the context @jbowens.