cockroachdb / pebble

RocksDB/LevelDB inspired key-value database in Go
BSD 3-Clause "New" or "Revised" License
4.95k stars 458 forks source link

panic closing diskHealthCheckingFile #2227

Open dankinder opened 1 year ago

dankinder commented 1 year ago

I have a node that died and now won't come back up due to this error, it crashes right away. Cockroach V22.1.11. This is when trying to open pebble on a file system that has gone read-only.

Not that I'd expect this to succeed, but it might fail a little more gracefully than "close of closed channel".

panic: close of closed channel [recovered]
        panic: close of closed channel

goroutine 194 [running]:
github.com/cockroachdb/pebble.Open.func1()
        github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/open.go:112 +0x175
panic({0x47e8000, 0x62b6ab0})
        GOROOT/src/runtime/panic.go:1047 +0x266
github.com/cockroachdb/pebble/vfs.(*diskHealthCheckingFile).stopTicker(...)
        github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/disk_health.go:95
github.com/cockroachdb/pebble/vfs.(*diskHealthCheckingFile).Close(0xc000526800)
        github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/disk_health.go:108 +0x25
github.com/cockroachdb/pebble/vfs.enospcFile.Close(...)
        github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/disk_full.go:353
github.com/cockroachdb/pebble.Open.func2()
        github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/open.go:171 +0x4b
github.com/cockroachdb/pebble.Open({0x7ffe841bf52c, 0xc0008144e0}, 0xc000277400)
        github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/open.go:213 +0xe43
github.com/cockroachdb/cockroach/pkg/storage.NewPebble({0x63c85b8, 0xc0014d6150}, {{{{0xc000bcb540, 0x1, 0x1}}, {0x7ffe841bf52c, 0x1a}, 0x0, 0xc6039664cc, 0x40000000, ...}, ...})
        github.com/cockroachdb/cockroach/pkg/storage/pebble.go:867 +0xcb0
github.com/cockroachdb/cockroach/pkg/server.(*Config).CreateEngines(0xc000f9b800, {0x63c85b8, 0xc0014d6150})
        github.com/cockroachdb/cockroach/pkg/server/config.go:649 +0x13c8
github.com/cockroachdb/cockroach/pkg/server.NewServer({{0xc000acca80, 0xc0003387e0, 0xc0002572c0, 0xc00052d950, 0xc001151c68, 0xc001151c50, {0xc0002572c0, {0x631e400, 0xc00052d950}, 0x0, ...}, ...}, ...}, ...)
        github.com/cockroachdb/cockroach/pkg/server/server.go:204 +0x48a
github.com/cockroachdb/cockroach/pkg/cli.runStart.func3.2(0xc0008cb170, 0xc0000107b8, 0xc000793d00, {0x63c85b8, 0xc001506930}, 0x0, {0x235ef01b, 0xedb4e0e15, 0x0})
        github.com/cockroachdb/cockroach/pkg/cli/start.go:613 +0x95
github.com/cockroachdb/cockroach/pkg/cli.runStart.func3()
        github.com/cockroachdb/cockroach/pkg/cli/start.go:669 +0xf6
created by github.com/cockroachdb/cockroach/pkg/cli.runStart
        github.com/cockroachdb/cockroach/pkg/cli/start.go:584 +0x7c5

It originally crashed due to disk stall, when the file system for this disk decided to go read-only:

F230106 20:42:08.675930 13173489831 storage/pebble.go:915 ⋮ [n23] 503726  disk stall detected: pebble unable to write to ‹/srv/disk2/cockroach-data2/10851368.log› in 20.63 seconds
F230106 20:42:08.675930 13173489831 storage/pebble.go:915 ⋮ [n23] 503726 !goroutine 13173489831 [running]:
F230106 20:42:08.675930 13173489831 storage/pebble.go:915 ⋮ [n23] 503726 !github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0x0)
F230106 20:42:08.675930 13173489831 storage/pebble.go:915 ⋮ [n23] 503726 !      github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0x8a
F230106 20:42:08.675930 13173489831 storage/pebble.go:915 ⋮ [n23] 503726 !github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0xc0003ad3c0, {{{0xc01db58030, 0x24}, {0x50b1d7a, 0x2}, {0x0, 0x0}, {0x0, 0x0}}, 0x1737d2f3398e221d, ...})
F230106 20:42:08.675930 13173489831 storage/pebble.go:915 ⋮ [n23] 503726 !      github.com/cockroachdb/cockroach/pkg/util/log/clog.go:237 +0xb8
F230106 20:42:08.675930 13173489831 storage/pebble.go:915 ⋮ [n23] 503726 !github.com/cockroachdb/cockroach/pkg/util/log.logfDepthInternal({0x63c85b8, 0xc0010af500}, 0x2, 0x4, 0x0, 0x0, {0x504656d, 0x41}, {0xc091617e40, 0x2, ...})
F230106 20:42:08.675930 13173489831 storage/pebble.go:915 ⋮ [n23] 503726 !      github.com/cockroachdb/cockroach/pkg/util/log/channels.go:106 +0x645
F230106 20:42:08.675930 13173489831 storage/pebble.go:915 ⋮ [n23] 503726 !github.com/cockroachdb/cockroach/pkg/util/log.logfDepth(...)
F230106 20:42:08.675930 13173489831 storage/pebble.go:915 ⋮ [n23] 503726 !      github.com/cockroachdb/cockroach/pkg/util/log/channels.go:39
F230106 20:42:08.675930 13173489831 storage/pebble.go:915 ⋮ [n23] 503726 !github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
F230106 20:42:08.675930 13173489831 storage/pebble.go:915 ⋮ [n23] 503726 !      github.com/cockroachdb/cockroach/bazel-out/k8-opt/bin/pkg/util/log/log_channels_generated.go:834
F230106 20:42:08.675930 13173489831 storage/pebble.go:915 ⋮ [n23] 503726 !github.com/cockroachdb/cockroach/pkg/storage.(*Pebble).makeMetricEtcEventListener.func2({{0xc07729aba0, 0xc00d329f74}, 0xc00e3d4820})
F230106 20:42:08.675930 13173489831 storage/pebble.go:915 ⋮ [n23] 503726 !      github.com/cockroachdb/cockroach/pkg/storage/pebble.go:915 +0x265
F230106 20:42:08.675930 13173489831 storage/pebble.go:915 ⋮ [n23] 503726 !github.com/cockroachdb/pebble.TeeEventListener.func4({{0xc07729aba0, 0x1146841c}, 0xedb4a7ea0})
F230106 20:42:08.675930 13173489831 storage/pebble.go:915 ⋮ [n23] 503726 !      github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/event.go:634 +0x5b
F230106 20:42:08.675930 13173489831 storage/pebble.go:915 ⋮ [n23] 503726 !github.com/cockroachdb/cockroach/pkg/storage.wrapFilesystemMiddleware.func1({0xc07729aba0, 0x9b97ce0}, 0x4cdd46bf8)
F230106 20:42:08.675930 13173489831 storage/pebble.go:915 ⋮ [n23] 503726 !      github.com/cockroachdb/cockroach/pkg/storage/pebble.go:569 +0x26
F230106 20:42:08.675930 13173489831 storage/pebble.go:915 ⋮ [n23] 503726 !github.com/cockroachdb/pebble/vfs.(*diskHealthCheckingFS).ReuseForWrite.func2(0xc0e63fc81146841c)
F230106 20:42:08.675930 13173489831 storage/pebble.go:915 ⋮ [n23] 503726 !      github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/disk_health.go:482 +0x2f
F230106 20:42:08.675930 13173489831 storage/pebble.go:915 ⋮ [n23] 503726 !github.com/cockroachdb/pebble/vfs.(*diskHealthCheckingFile).startTicker.func1()
F230106 20:42:08.675930 13173489831 storage/pebble.go:915 ⋮ [n23] 503726 !      github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/disk_health.go:86 +0x21f
F230106 20:42:08.675930 13173489831 storage/pebble.go:915 ⋮ [n23] 503726 !created by github.com/cockroachdb/pebble/vfs.(*diskHealthCheckingFile).startTicker
F230106 20:42:08.675930 13173489831 storage/pebble.go:915 ⋮ [n23] 503726 !      github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/disk_health.go:67 +0x65

Jira issue: PEBBLE-157

jbowens commented 1 year ago

Maybe related to #2300.