cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.87k stars 3.77k forks source link

storage/backupccl: DownloadSpan call hung for 4+ minutes #121805

Closed msbutler closed 5 months ago

msbutler commented 5 months ago

Running ./dev test pkg/ccl/backupccl/backuprand --stress on my gceworker on master (sha 2bd61ba8700c0ada35d8d6dd24546e2a36999900) failed on a test run time out. A sampling of the stack trace suggests several downloadSpan calls waiting for the main db lock.

goroutine 5964 [sync.Cond.Wait, 4 minutes]:
sync.runtime_notifyListWait(0xc003b941f8, 0x7)
  GOROOT/src/runtime/sema.go:569 +0x159
sync.(*Cond).Wait(0xc003b94008?)
  GOROOT/src/sync/cond.go:70 +0x85
github.com/cockroachdb/pebble.(*DB).downloadSpan.func5(0xc003b94008, 0xc00624bc18?, 0xc00624bc00)
  github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/db.go:1998 +0x130
github.com/cockroachdb/pebble.(*DB).downloadSpan(0xc003b94008, {0x7e39760, 0xc009883bd0}, {{0xc008c4bc88, 0x3, 0x8}, {0xc008c4bc90, 0x3, 0x8}, 0x1})
  github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/db.go:2000 +0x328
github.com/cockroachdb/pebble.(*DB).Download(0xc003b94008, {0x7e39798?, 0xc0077b6ae0?}, {0xc008aafdf0, 0x1, 0x4?})
  github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/db.go:2034 +0x210
github.com/cockroachdb/cockroach/pkg/storage.(*Pebble).Download(0xc00144b688, {0x7e39760?, 0xc000053ea0?}, {{0xc008c4bc88, 0x3, 0x8}, {0xc008c4bc90, 0x3, 0x8}}, 0x1)
  github.com/cockroachdb/cockroach/pkg/storage/pebble.go:1024 +0x1e5
github.com/cockroachdb/cockroach/pkg/server.downloadSpans.func1({0x7e39760, 0xc000053ea0}, 0xf0f00c005372008?)
  github.com/cockroachdb/cockroach/pkg/server/span_download.go:133 +0x12a
github.com/cockroachdb/cockroach/pkg/util/ctxgroup.GroupWorkers.func1({0x7e39760?, 0xc000053ea0?})
  github.com/cockroachdb/cockroach/pkg/util/ctxgroup/ctxgroup.go:177 +0x25
github.com/cockroachdb/cockroach/pkg/util/ctxgroup.GroupWorkers.Group.GoCtx.func2()
  github.com/cockroachdb/cockroach/pkg/util/ctxgroup/ctxgroup.go:168 +0x1f
golang.org/x/sync/errgroup.(*Group).Go.func1()
  golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:78 +0x56
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 4319
  golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:75 +0x96

another stack trace:

goroutine 5955 [sync.Mutex.Lock, 4 minutes]:
sync.runtime_SemacquireMutex(0x0?, 0x2?, 0x4f4729280c2de?)
  GOROOT/src/runtime/sema.go:77 +0x25
sync.(*Mutex).lockSlow(0xc003b94120)
  GOROOT/src/sync/mutex.go:171 +0x15d
sync.(*Mutex).Lock(...)
  GOROOT/src/sync/mutex.go:90
github.com/cockroachdb/pebble.(*DB).downloadSpan.func5(0xc003b94008, 0xc00745dc18, 0xc00745dc00)
  github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/db.go:1982 +0x65
github.com/cockroachdb/pebble.(*DB).downloadSpan(0xc003b94008, {0x7e39760, 0xc009883950}, {{0xc008c4bad8, 0x2, 0x8}, {0xc008c4bae0, 0x2, 0x8}, 0x1})
  github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/db.go:2000 +0x328
github.com/cockroachdb/pebble.(*DB).Download(0xc003b94008, {0x7e39798?, 0xc0077b67b0?}, {0xc00745ddf0, 0x1, 0xc0000e8008?})
  github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/db.go:2034 +0x210
github.com/cockroachdb/cockroach/pkg/storage.(*Pebble).Download(0xc00144b688, {0x7e39760?, 0xc000053ea0?}, {{0xc008c4bad8, 0x2, 0x8}, {0xc008c4bae0, 0x2, 0x8}}, 0x1)
  github.com/cockroachdb/cockroach/pkg/storage/pebble.go:1024 +0x1e5
github.com/cockroachdb/cockroach/pkg/server.downloadSpans.func1({0x7e39760, 0xc000053ea0}, 0xf0f00c00584c008?)
  github.com/cockroachdb/cockroach/pkg/server/span_download.go:133 +0x12a
github.com/cockroachdb/cockroach/pkg/util/ctxgroup.GroupWorkers.func1({0x7e39760?, 0xc000053ea0?})
  github.com/cockroachdb/cockroach/pkg/util/ctxgroup/ctxgroup.go:177 +0x25
github.com/cockroachdb/cockroach/pkg/util/ctxgroup.GroupWorkers.Group.GoCtx.func2()
  github.com/cockroachdb/cockroach/pkg/util/ctxgroup/ctxgroup.go:168 +0x1f
golang.org/x/sync/errgroup.(*Group).Go.func1()
  golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:78 +0x56
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 4319
  golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:75 +0x96
goroutine 5975 [sync.Cond.Wait, 4 minutes]:
sync.runtime_notifyListWait(0xc0002478b0, 0x3)
  GOROOT/src/runtime/sema.go:569 +0x159
sync.(*Cond).Wait(0x0?)
  GOROOT/src/sync/cond.go:70 +0x85
github.com/cockroachdb/pebble.(*versionSet).logLock(...)
  github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/version_set.go:389
github.com/cockroachdb/pebble.(*DB).compact1.func2(0xc003b94008, 0xc006243808, 0x3e, 0xc001d5f680)
  github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/compaction.go:2380 +0x46
github.com/cockroachdb/pebble.(*DB).compact1(0xc003b94008, 0xc006243808, 0xc00672b380)
  github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/compaction.go:2397 +0x24b
github.com/cockroachdb/pebble.(*DB).compact.func1({0x7e39798?, 0xc0070f1bc0?})
  github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/compaction.go:2338 +0xa7
runtime/pprof.Do({0x7e39638?, 0xbfbf220?}, {{0xc000910b20?, 0x10d2a80?, 0xc001443d40?}}, 0xc002d7af88)
  GOROOT/src/runtime/pprof/runtime.go:51 +0x9d
github.com/cockroachdb/pebble.(*DB).compact(0xc002d7af60?, 0xc002d7af90?, 0xc002d7af80?)
  github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/compaction.go:2335 +0x65
created by github.com/cockroachdb/pebble.(*DB).maybeScheduleDownloadCompaction in goroutine 5954
  github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/compaction.go:1957 +0x4a5

Jira issue: CRDB-37560

blathers-crl[bot] commented 5 months ago

Hi @msbutler, please add branch-* labels to identify which branch(es) this GA-blocker affects.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

msbutler commented 5 months ago

fwiw, i reran this test because we're still seeing empty virtual ssts https://github.com/cockroachdb/cockroach/issues/121751

msbutler commented 5 months ago

i'll try running this test with the race detector to get a more descriptive message.

itsbilal commented 5 months ago

This sounds like a Cockroach version of https://github.com/cockroachdb/pebble/issues/3470

itsbilal commented 5 months ago

Good thing is the fix for that is already in pebble and is just awaiting a pebble bump in cockroach: https://github.com/cockroachdb/pebble/pull/3483

msbutler commented 5 months ago

ah sweet, glad it wasn't a new bug!

RaduBerinde commented 5 months ago

I will close when I bump the dep later today.

msbutler commented 5 months ago

closed via https://github.com/cockroachdb/cockroach/pull/121869