Open steveyen opened 7 years ago
Compressing on a block-by-block basis seems like a natural approach, and is an approach utilized by other storage libraries like {go}leveldb, rocksdb, couchstore, etc.
One issue is the current segment implementation of mossStore doesn't even have a block abstraction. :-/ So a new kind of segment implementation is probably needed. With recent discussions with @mschoch on supporting additional kinds of segment implementations, this might not be too far out of bounds.
One thought would be that a new "compressing" segment kind might be engaged during compaction. The compactWriter{} could buffer up a bunch of mutations and then write out snappy compressed buf blocks, probably near or related to codepaths here... https://github.com/couchbase/moss/blob/master/store_compact.go#L332
But, the kvs data might stay uncompressed.
When there's a binary search through the kvs array, the new segment kind would have to round down to the nearest compressed buf block, uncompress the buf block (where the uncompressed bytes can perhaps be cached), and then proceed with the actual buf array lookup.
More thought needed.
From more discussion with @hisundar today, some thoughts are (#1) to split a big segment into multiple smaller segments, which might have a new segment kind (e.g., "a-compressed"). These need to be uncompressed whenever a segment is read.
Another idea (#2) is that each SegmentLoc might also track its minKey and maxKey, and track whether they are non-overlapping with their neighbors, allowing for potential binary-search amongst SegmentLocs in order to speed up lookups (don't have to examine a SegmentLoc if it doesn't have the key you're looking for). This brings LevelDB inspired ideas into the picture, where levels in LevelDB > level 0 may be non-overlapping in keyspace.
See also https://github.com/blevesearch/bleve/issues/553 for some real-world use case.