cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.17k stars 3.82k forks source link

storage: write test for ENOSPC #63847

Open jbowens opened 3 years ago

jbowens commented 3 years ago

This may have been resolved by #64385. I'm reframing this issue to write a test (adapt the disk-full roachtest?) that intentionally exhausts all available disk space and ensures no corruption results.


Original issue:

While attempting to reproduce the AddSSTable checksum failure #63297, I accidentally exhausted disk space on many nodes during a restore. On a couple nodes, this manifested as a panic from an iterator trying to read an empty sstable. Both of the instances were on nodes with encryption-at-rest enabled.

panic: pebble/table: invalid table (file size is too small) [recovered]
    panic: pebble/table: invalid table (file size is too small) [recovered]
    panic: pebble/table: invalid table (file size is too small) [recovered]
    panic: pebble/table: invalid table (file size is too small)

goroutine 145024318 [running]:
panic(0x46ced20, 0xc01071e800)
    /usr/local/go/src/runtime/panic.go:1064 +0x545 fp=0xc0016a7528 sp=0xc0016a7460 pc=0x48b2c5
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc000d13780, 0x59dd0a0, 0xc009bcd940)
    /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:233 +0x126 fp=0xc0016a7588 sp=0xc0016a7528 pc=0x14ee926
runtime.call32(0x0, 0x52999c8, 0xc0105190c8, 0x1800000018)
    /usr/local/go/src/runtime/asm_amd64.s:540 +0x3e fp=0xc0016a75b8 sp=0xc0016a7588 pc=0x4c2d9e
runtime.reflectcallSave(0xc0016a76f8, 0x52999c8, 0xc0105190c8, 0x18)
    /usr/local/go/src/runtime/panic.go:881 +0x58 fp=0xc0016a75e8 sp=0xc0016a75b8 pc=0x48acb8
runtime.runOpenDeferFrame(0xc002162000, 0xc010519080, 0x0)
    /usr/local/go/src/runtime/panic.go:855 +0x2cd fp=0xc0016a7678 sp=0xc0016a75e8 pc=0x48ab6d
panic(0x46ced20, 0xc01071e800)
    /usr/local/go/src/runtime/panic.go:969 +0x1b9 fp=0xc0016a7740 sp=0xc0016a7678 pc=0x48af39
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc000d13780, 0x59dd160, 0xc00adf1740)
    /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:233 +0x126 fp=0xc0016a77a0 sp=0xc0016a7740 pc=0x14ee926
runtime.call32(0x0, 0x52999c8, 0xc0105190c8, 0x1800000018)
    /usr/local/go/src/runtime/asm_amd64.s:540 +0x3e fp=0xc0016a77d0 sp=0xc0016a77a0 pc=0x4c2d9e
runtime.reflectcallSave(0xc0016a7910, 0x52999c8, 0xc0105190c8, 0xc000000018)
    /usr/local/go/src/runtime/panic.go:881 +0x58 fp=0xc0016a7800 sp=0xc0016a77d0 pc=0x48acb8
runtime.runOpenDeferFrame(0xc002162000, 0xc010519080, 0x0)
    /usr/local/go/src/runtime/panic.go:855 +0x2cd fp=0xc0016a7890 sp=0xc0016a7800 pc=0x48ab6d
panic(0x46ced20, 0xc01071e800)
    /usr/local/go/src/runtime/panic.go:969 +0x1b9 fp=0xc0016a7958 sp=0xc0016a7890 pc=0x48af39
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).Send.func1(0xc0016aa318, 0xc0016aa3b8, 0xc000d2d100, 0xc0016aa3b0)
    /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_send.go:100 +0x247 fp=0xc0016a7980 sp=0xc0016a7958 pc=0x1f28347
runtime.call32(0x0, 0x52950f0, 0xc0016a9c30, 0x2000000020)
    /usr/local/go/src/runtime/asm_amd64.s:540 +0x3e fp=0xc0016a79b0 sp=0xc0016a7980 pc=0x4c2d9e
panic(0x46ced20, 0xc01071e800)
    /usr/local/go/src/runtime/panic.go:975 +0x47a fp=0xc0016a7a78 sp=0xc0016a79b0 pc=0x48b1fa
github.com/cockroachdb/cockroach/pkg/storage.(*pebbleIterator).destroy(0xc000a64028)
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/pebble_iterator.go:724 +0x1a7 fp=0xc0016a7b90 sp=0xc0016a7a78 pc=0x1ba4bc7
github.com/cockroachdb/cockroach/pkg/storage.(*pebbleBatch).Close(0xc000a64000)
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/pebble_batch.go:113 +0x6a fp=0xc0016a7bb8 sp=0xc0016a7b90 pc=0x1b9b92a
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).evaluateProposal(0xc008a3c900, 0x59dd160, 0xc00adf17a0, 0xc01b5eb388, 0x8, 0xc0050f35e0, 0x167626a584079259, 0x0, 0xc01f99d7a0, 0xc011f76000, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_proposal.go:786 +0xb52 fp=0xc0016a85b0 sp=0xc0016a7bb8 pc=0x1e86652
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).requestToProposal(0xc008a3c900, 0x59dd160, 0xc00adf17a0, 0xc01b5eb388, 0x8, 0xc0050f35e0, 0x16761efcc9e8323e, 0x0, 0x0, 0x800000008, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_proposal.go:884 +0xb6 fp=0xc0016a8658 sp=0xc0016a85b0 pc=0x1e86916
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).evalAndPropose(0xc008a3c900, 0x59dd160, 0xc00adf17a0, 0xc0050f35e0, 0xc01aa4b5e0, 0x16761efcc9e8323e, 0x0, 0x0, 0x800000008, 0x2, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_raft.go:84 +0x197 fp=0xc0016a8e18 sp=0xc0016a8658 pc=0x1e8de97
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeWriteBatch(0xc008a3c900, 0x59dd160, 0xc00adf17a0, 0xc0050f35e0, 0xc01aa4b5e0, 0x0, 0x0, 0x0)
    /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_write.go:138 +0x865 fp=0xc0016a95d0 sp=0xc0016a8e18 pc=0x1ebe0e5
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeBatchWithConcurrencyRetries(0xc008a3c900, 0x59dd160, 0xc00adf17a0, 0xc0050f35e0, 0x5295098, 0x0, 0x0)
    /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:275 +0x336 fp=0xc0016a97d8 sp=0xc0016a95d0 pc=0x1eb1b16
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).sendWithRangeID(0xc008a3c900, 0x59dd160, 0xc00adf1770, 0x2be9, 0xc0050f35e0, 0x0, 0x1)
    /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:95 +0x55d fp=0xc0016a9a10 sp=0xc0016a97d8 pc=0x1eb105d
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).Send(...)
    /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:34
    ...
github.com/cockroachdb/pebble/internal/base.CorruptionErrorf
    /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/internal/base/error.go:27
github.com/cockroachdb/pebble/sstable.readFooter
    /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/sstable/table.go:219
github.com/cockroachdb/pebble/sstable.NewReader
    /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/sstable/reader.go:2302
github.com/cockroachdb/pebble.(*tableCacheValue).load
    /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/table_cache.go:596
github.com/cockroachdb/pebble.(*tableCacheShard).findNode.func2
    /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/table_cache.go:400
runtime/pprof.Do
    /usr/local/go/src/runtime/pprof/runtime.go:40
github.com/cockroachdb/pebble.(*tableCacheShard).findNode
    /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/table_cache.go:399
github.com/cockroachdb/pebble.(*tableCacheShard).newIters
    /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/table_cache.go:189
github.com/cockroachdb/pebble.(*tableCache).newIters
    /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/table_cache.go:56
github.com/cockroachdb/pebble.(*levelIter).loadFile
    /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/level_iter.go:338
github.com/cockroachdb/pebble.(*levelIter).SeekPrefixGE
    /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/level_iter.go:408
github.com/cockroachdb/pebble.(*mergingIter).seekGE
    /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/merging_iter.go:830
github.com/cockroachdb/pebble.(*mergingIter).SeekPrefixGE
    /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/merging_iter.go:899
github.com/cockroachdb/pebble.(*Iterator).SeekPrefixGE
    /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/iterator.go:729
github.com/cockroachdb/cockroach/pkg/storage.(*pebbleIterator).SeekGE
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/pebble_iterator.go:262
github.com/cockroachdb/cockroach/pkg/storage.(*pebbleMVCCScanner).get
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/pebble_mvcc_scanner.go:185
github.com/cockroachdb/cockroach/pkg/storage.mvccGet
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/mvcc.go:886
github.com/cockroachdb/cockroach/pkg/storage.MVCCGet
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/mvcc.go:847
github.com/cockroachdb/cockroach/pkg/storage.MVCCGetProto
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/mvcc.go:718
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).GetLastReplicaGCTimestamp
    /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica.go:1045
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicaGCQueue).shouldQueue
    /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_gc_queue.go:133
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).maybeAdd
    /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:661
github.com/cockroachdb/cockroach/pkg/kv/kvserver.baseQueueHelper.MaybeAdd
    /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:548
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).MaybeAddAsync.func1
    /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:594
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).Async.func1
    /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:581
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunLimitedAsyncTask.func2
    /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:410
...
I210415 21:53:22.780831 144997202 3@vendor/github.com/cockroachdb/pebble/compaction.go:2773 ⋮ [n8,pebble] 122282  [JOB 33139] sstable deleted 046297
I210415 21:53:22.780933 144997151 3@vendor/github.com/cockroachdb/pebble/compaction.go:1840 ⋮ [n8,pebble] 122283  [JOB 33140] compacting L6 [046303] (1.0 K) + L6 [] (0 B)
I210415 21:53:22.876410 144975416 3@vendor/github.com/cockroachdb/pebble/ingest.go:268 ⋮ [n8,pebble] 122284  [JOB 33137] ingesting: sstable created 046504
I210415 21:53:23.347380 144975416 3@vendor/github.com/cockroachdb/pebble/ingest.go:268 ⋮ [n8,pebble] 122285  [JOB 33137] ingesting: sstable created 046505
I210415 21:53:23.536609 144975416 3@vendor/github.com/cockroachdb/pebble/ingest.go:268 ⋮ [n8,pebble] 122286  [JOB 33137] ingesting: sstable created 046506
I210415 21:53:23.537503 144997151 3@vendor/github.com/cockroachdb/pebble/compaction.go:1881 ⋮ [n8,pebble] 122287  [JOB 33140] compacted L6 [046303] (1.0 K) + L6 [] (0 B) -> L6 [] (0 B), in 0.8s, output rate 0 B/s
I210415 21:53:23.886055 144975416 3@vendor/github.com/cockroachdb/pebble/ingest.go:268 ⋮ [n8,pebble] 122288  [JOB 33137] ingesting: sstable created 046507
I210415 21:53:24.669590 144968251 3@vendor/github.com/cockroachdb/pebble/ingest.go:268 ⋮ [n8,pebble] 122289  [JOB 33138] ingesting: sstable created 046509
I210415 21:53:25.143758 144968251 3@vendor/github.com/cockroachdb/pebble/ingest.go:268 ⋮ [n8,pebble] 122290  [JOB 33138] ingesting: sstable created 046514
I210415 21:53:25.298327 144975416 3@vendor/github.com/cockroachdb/pebble/ingest.go:616 ⋮ [n8,pebble] 122291  ingest failed to remove original file: write ‹/mnt/data1/cockroach/COCKROACHDB_REGISTRY.crdbtmp›: no space left on device
I210415 21:53:25.652820 144975416 3@vendor/github.com/cockroachdb/pebble/ingest.go:616 ⋮ [n8,pebble] 122292  ingest failed to remove original file: write ‹/mnt/data1/cockroach/COCKROACHDB_REGISTRY.crdbtmp›: no space left on device
I210415 21:53:25.824620 144975416 3@vendor/github.com/cockroachdb/pebble/ingest.go:616 ⋮ [n8,pebble] 122293  ingest failed to remove original file: write ‹/mnt/data1/cockroach/COCKROACHDB_REGISTRY.crdbtmp›: no space left on device
I210415 21:53:26.008105 144975416 3@vendor/github.com/cockroachdb/pebble/ingest.go:616 ⋮ [n8,pebble] 122294  ingest failed to remove original file: write ‹/mnt/data1/cockroach/COCKROACHDB_REGISTRY.crdbtmp›: no space left on device
I210415 21:53:26.333206 144975416 3@vendor/github.com/cockroachdb/pebble/ingest.go:637 ⋮ [n8,pebble] 122295  [JOB 33137] ingested L4:046503 (1.3 K), L4:046508 (1.2 K), L4:046504 (1.3 K), L4:046505 (1.1 K), L4:046506 (1.0 K), L6:046507 (1.0 K)
I210415 21:53:26.523448 144993746 3@vendor/github.com/cockroachdb/pebble/ingest.go:268 ⋮ [n8,pebble] 122296  [JOB 33142] ingesting: sstable created 046515
I210415 21:53:26.789442 144968251 3@vendor/github.com/cockroachdb/pebble/ingest.go:268 ⋮ [n8,pebble] 122297  [JOB 33138] ingesting: sstable created 046510
I210415 21:53:26.789455 145015306 3@vendor/github.com/cockroachdb/pebble/compaction.go:2773 ⋮ [n8,pebble] 122298  [JOB 33143] sstable deleted 046303
I210415 21:53:26.789544 145015304 3@vendor/github.com/cockroachdb/pebble/compaction.go:1840 ⋮ [n8,pebble] 122299  [JOB 33144] compacting L6 [046309] (1.0 K) + L6 [] (0 B)
I210415 21:53:27.003093 144475891 3@vendor/github.com/cockroachdb/pebble/compaction.go:1881 ⋮ [n8,pebble] 122300  [JOB 32978] compaction to L6 error: write ‹/mnt/data1/cockroach/046286.sst›: no space left on device
I210415 21:53:27.003220 144475891 3@vendor/github.com/cockroachdb/pebble/compaction.go:1816 ⋮ [n8,pebble] 122301  background error: write ‹/mnt/data1/cockroach/046286.sst›: no space left on device
I210415 21:53:27.346501 144993746 3@vendor/github.com/cockroachdb/pebble/ingest.go:268 ⋮ [n8,pebble] 122302  [JOB 33142] ingesting: sstable created 046520
I210415 21:53:27.642201 144968251 3@vendor/github.com/cockroachdb/pebble/ingest.go:268 ⋮ [n8,pebble] 122303  [JOB 33138] ingesting: sstable created 046511
I210415 21:53:28.092050 144993746 3@vendor/github.com/cockroachdb/pebble/ingest.go:268 ⋮ [n8,pebble] 122304  [JOB 33142] ingesting: sstable created 046516
I210415 21:53:28.248290 144968251 3@vendor/github.com/cockroachdb/pebble/ingest.go:268 ⋮ [n8,pebble] 122305  [JOB 33138] ingesting: sstable created 046512
I210415 21:53:28.248941 145015304 3@vendor/github.com/cockroachdb/pebble/compaction.go:1881 ⋮ [n8,pebble] 122306  [JOB 33144] compacted L6 [046309] (1.0 K) + L6 [] (0 B) -> L6 [] (0 B), in 1.5s, output rate 0 B/s
I210415 21:53:28.251125 145023492 3@vendor/github.com/cockroachdb/pebble/compaction.go:2773 ⋮ [n8,pebble] 122307  [JOB 33144] sstable deleted 046309
I210415 21:53:28.251147 145023493 3@vendor/github.com/cockroachdb/pebble/compaction.go:1840 ⋮ [n8,pebble] 122308  [JOB 33145] compacting L0 [046336 046356 046382 046402 046422 046442 046456 046476 046502] (681 K) + L2 [046285] (340 K)
I210415 21:53:28.586238 144993746 3@vendor/github.com/cockroachdb/pebble/ingest.go:268 ⋮ [n8,pebble] 122309  [JOB 33142] ingesting: sstable created 046517
I210415 21:53:28.755631 144968251 3@vendor/github.com/cockroachdb/pebble/ingest.go:268 ⋮ [n8,pebble] 122310  [JOB 33138] ingesting: sstable created 046513
I210415 21:53:29.199469 144993746 3@vendor/github.com/cockroachdb/pebble/ingest.go:268 ⋮ [n8,pebble] 122311  [JOB 33142] ingesting: sstable created 046518

cockroach.jackson-1618504336-22-n10cpu4-0008.ubuntu.2021-04-15T21_14_52Z.008324.log cockroach-pebble.jackson-1618504336-22-n10cpu4-0008.ubuntu.2021-04-15T17_27_56Z.008324.log cockroach-pebble.jackson-1618504336-22-n10cpu4-0008.ubuntu.2021-04-15T18_55_18Z.008324.log cockroach-stderr.jackson-1618504336-22-n10cpu4-0008.ubuntu.2021-04-15T17_27_56Z.008324.log

Jira issue: CRDB-6794

jbowens commented 3 years ago

Here are log files from the other instance.

cockroach-stderr.jackson-1618504336-40-n10cpu4-0006.ubuntu.2021-04-15T18_16_15Z.014125.log cockroach.jackson-1618504336-40-n10cpu4-0006.ubuntu.2021-04-15T20_57_34Z.014125.log cockroach-pebble.jackson-1618504336-40-n10cpu4-0006.ubuntu.2021-04-15T20_04_29Z.014125.log

jbowens commented 2 years ago

This may have been resolved by https://github.com/cockroachdb/cockroach/pull/64385. I'm reframing this issue to write a test (adapt the disk-full roachtest?) that intentionally exhausts all available disk space and ensures no corruption results.

github-actions[bot] commented 1 year ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!