cockroachdb / pebble

RocksDB/LevelDB inspired key-value database in Go
BSD 3-Clause "New" or "Revised" License
4.81k stars 449 forks source link

tool: build an `sstable scan` cmd that can read the backing file contents through a virtual reader #3302

Open msbutler opened 7 months ago

msbutler commented 7 months ago

Currently, there isn't a great way to inspect the backing file contents of a virtual file. We ought to build an sstable scan-like command that takes in just a file number, resolves the path to it (and opens it), and applies all transforms to a file and spits out all the keys within it.

Note: this issue was originally written as: Currently, the reader.Layout() command serves a similar purpose: the sstable tool and the checksum codepath use the reader.Layout() function to inspect the block organization of a local sstable. If a Virtual Reader calls Layout() to inspect its backing file, Layout() is naive to certain transformations (e.g. virtual sst bounds, synthetic suffix replacement) that normally affect the virtual read path. While this doesn't seem to be problem currently, as these transformations do not affect checksum results, we should consider teaching the Layout() call about virtual sst transformations so that backing files are read in a consistent manner throughout the codebase.

Jira issue: PEBBLE-112

Epic CRDB-40359

itsbilal commented 7 months ago

@msbutler we talked about this in the storage weekly and felt that Layout() is probably best suited at continuing to reflect the true physical form of the sstable, that is, without any interpretative vsst transformations from the FileMetadata applied on it. However, we'd probably like to see an sstable scan-like command that takes in just a file number, resolves the path to it (and opens it), and applies all transforms to a file and spits out all the keys within it. That will likely be useful. Are you open to changing up this issue to reflect that?

msbutler commented 7 months ago

Done.