cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.03k stars 3.8k forks source link

storage: key seeker out-of-bounds #132964

Open jbowens opened 5 hours ago

jbowens commented 5 hours ago

Encountered when attempting to enable columnar blocks metamorphically: #132935

Seems like it's likely a bug in either the key seeker or the recording of the maximum key length within a data block.

https://mesolite.cluster.engflow.com/api/contentaddressablestorage/v1/instances/default/blobs/00129a68b75c23eee1956d724723a3d21b84d56c36cb8dc6cc2f6dc574be6b2c/1464900

=== RUN   TestDataDriven_restore_schema_only_multiregion
    test_log_scope.go:165: test logs captured to: outputs.zip/logTestDataDriven_restore_schema_only_multiregion4084208214
    test_log_scope.go:76: use -show-logs to present logs inline
    datadriven_test.go:447: Mock HTTP Storage "http://127.0.0.1:34797"
panic: runtime error: slice bounds out of range [:59] with capacity 49
    panic: Release called on a BufferPool with in-use buffers
goroutine 46564 [running]:
github.com/cockroachdb/pebble/sstable/block.(*BufferPool).Release(0xc00f28d5c8)
    github.com/cockroachdb/pebble/sstable/block/external/com_github_cockroachdb_pebble/sstable/block/buffer_pool.go:147 +0xc5
panic({0x6bfe3a0?, 0xc007195380?})
    GOROOT/src/runtime/panic.go:770 +0x132
github.com/cockroachdb/cockroach/pkg/storage.(*cockroachKeySeeker).MaterializeUserKey(0xc00b5ff970, 0xc0023ecf08, 0x7ffb4f1580a0?, 0x113)
    github.com/cockroachdb/cockroach/pkg/storage/pebble_key_schema.go:442 +0x2b2
github.com/cockroachdb/pebble/sstable/colblk.(*DataBlockIter).Next(0xc0023ece30)
    github.com/cockroachdb/pebble/sstable/colblk/external/com_github_cockroachdb_pebble/sstable/colblk/data_block.go:1144 +0x20f
github.com/cockroachdb/pebble/sstable.(*singleLevelIterator[...]).Next(0x8aab4e0)
    github.com/cockroachdb/pebble/sstable/external/com_github_cockroachdb_pebble/sstable/reader_iter_single_lvl.go:1311 +0x5d
github.com/cockroachdb/pebble.(*levelIter).Next(0xc0044d3408)
    github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/level_iter.go:764 +0x78
github.com/cockroachdb/pebble.(*mergingIter).nextEntry(0xc006c0c6c0, 0xc00657f518, {0x0?, 0x2c?, 0x31?})
    github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/merging_iter.go:541 +0x89
github.com/cockroachdb/pebble.(*mergingIter).Next(0xc006c0c6c0)
    github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/merging_iter.go:1189 +0x49

Jira issue: CRDB-43365

jbowens commented 5 hours ago
panic: runtime error: slice bounds out of range [:143] with capacity 131
    panic: Release called on a BufferPool with in-use buffers
goroutine 96288 [running]:
github.com/cockroachdb/pebble/sstable/block.(*BufferPool).Release(0xc00ddd83c8)
    github.com/cockroachdb/pebble/sstable/block/external/com_github_cockroachdb_pebble/sstable/block/buffer_pool.go:147 +0xc5
panic({0x6bfe3a0?, 0xc0160be4f8?})
    GOROOT/src/runtime/panic.go:770 +0x132
github.com/cockroachdb/cockroach/pkg/storage.(*cockroachKeySeeker).MaterializeUserKey(0xc010329340, 0xc006c94f08, 0x654?, 0x49)
    github.com/cockroachdb/cockroach/pkg/storage/pebble_key_schema.go:442 +0x2b2
github.com/cockroachdb/pebble/sstable/colblk.(*DataBlockIter).Next(0xc006c94e30)
    github.com/cockroachdb/pebble/sstable/colblk/external/com_github_cockroachdb_pebble/sstable/colblk/data_block.go:1144 +0x20f
github.com/cockroachdb/pebble/sstable.(*singleLevelIterator[...]).Next(0x8aab4e0)
    github.com/cockroachdb/pebble/sstable/external/com_github_cockroachdb_pebble/sstable/reader_iter_single_lvl.go:1311 +0x5d
github.com/cockroachdb/pebble.(*levelIter).Next(0xc013450408)
    github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/level_iter.go:764 +0x78
github.com/cockroachdb/pebble.(*mergingIter).nextEntry(0xc00b2537a0, 0xc01049d3b0, {0x0?, 0x5a?, 0x83?})
    github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/merging_iter.go:541 +0x89
github.com/cockroachdb/pebble.(*mergingIter).Next(0xc00b2537a0)
    github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/merging_iter.go:1189 +0x49
github.com/cockroachdb/pebble/internal/keyspan.(*InterleavingIter).nextPos(0xc00adfd310)
    github.com/cockroachdb/pebble/internal/keyspan/external/com_github_cockroachdb_pebble/internal/keyspan/interleaving_iter.go:646 +0x65
github.com/cockroachdb/pebble/internal/keyspan.(*InterleavingIter).Next(0xc00adfd310)
    github.com/cockroachdb/pebble/internal/keyspan/external/com_github_cockroachdb_pebble/internal/keyspan/interleaving_iter.go:478 +0xe5
github.com/cockroachdb/pebble/internal/compact.(*Iter).iterNext(0xc00adfd108)
    github.com/cockroachdb/pebble/internal/compact/external/com_github_cockroachdb_pebble/internal/compact/iterator.go:668 +0x2e
github.com/cockroachdb/pebble/internal/compact.(*Iter).nextInStripeHelper(0xc00adfd108)
    github.com/cockroachdb/pebble/internal/compact/external/com_github_cockroachdb_pebble/internal/compact/iterator.go:714 +0x3d
github.com/cockroachdb/pebble/internal/compact.(*Iter).nextInStripe(...)
    github.com/cockroachdb/pebble/internal/compact/external/com_github_cockroachdb_pebble/internal/compact/iterator.go:705
github.com/cockroachdb/pebble/internal/compact.(*Iter).skipInStripe(0xc00adfd108)
    github.com/cockroachdb/pebble/internal/compact/external/com_github_cockroachdb_pebble/internal/compact/iterator.go:658 +0x25
github.com/cockroachdb/pebble/internal/compact.(*Iter).Next(0xc00adfd108)
    github.com/cockroachdb/pebble/internal/compact/external/com_github_cockroachdb_pebble/internal/compact/iterator.go:528 +0x9f4
RaduBerinde commented 4 hours ago

walltime: 0 logical: 50

This shouldn't be allowed. Perhaps a problem when building the logical time column. I meant to add some assertions for this.

RaduBerinde commented 4 hours ago

Hm, we are writing a key like this, not sure how/why. This a key passed to cockroachKeyWriter.WriteKey: 016B12C8000174786E2D6BE26D82BA914A51A9D12356572D06450000000000000000000000005B0D

RaduBerinde commented 1 hour ago

We can investigate the zero walltime thing separately, it does not explain the issue, our encoding/decoding code should allow it.

In an instance of this, we write a key:

row=67 key=017373746E67128E8912646961676E6F73746963732E7265706F7274696E672E656E61626C656400FF0188000100 keyPrefixLen=46/46

and then when we read it we see a non-zero logical:

panic: row: 67  prefix: 017373746E67128E8912646961676E6F73746963732E7265706F7274696E672E656E61626C656400FF01880001  walltime: 0  logical: 40

I'm suspecting a bug with the "default value" handling of UintBuilder, though I can't find anything wrong with the code so far.

If I add this, I can't repro it anymore:

        switch versionLen {
        case 0:
                // No-op.
+               kw.logicalTimes.Set(row, 0)
        case 9:
                wallTime = binary.BigEndian.Uint64(key[keyPrefixLen : keyPrefixLen+8])
+               kw.logicalTimes.Set(row, 0)
        case 13, 14:
                wallTime = binary.BigEndian.Uint64(key[keyPrefixLen : keyPrefixLen+8])
                kw.logicalTimes.Set(row, uint64(binary.BigEndian.Uint32(key[keyPrefixLen+8:keyPrefixLen+12])))
                // NOTE: byte 13 used to store the timestamp's synthetic bit, but this is no
                // longer consulted and can be ignored during decoding.
        default:
                // Not a MVCC timestamp.
                untypedVersion = key[keyPrefixLen:]
+               kw.logicalTimes.Set(row, 0)
        }