Open jiacai2050 opened 3 days ago
Interesting, I'll get into it.
Here are some key design considerations to clarify in advance:
Those are all great questions,
version
field to deal with schema evolution, if we want to add some fields to manifest, a new version could be added, and when merge, to convert old manifest to new one.As for the third question, that why we need to keep metadata of each sst small, so we can hold millions of sst files in one manifest snapshot, whose size is less than 1GB.
1024*1024*1024 / 28 (size of each sst's metadata) = 38347922
Describe This Problem
Currently our manifest is defined using protobuf, that's: https://github.com/apache/horaedb/blob/9e81c4ed5df1998cbd210dc48fc67b6b7405a553/horaedb/pb_types/protos/sst.proto#L32
Protobuf is useful for schema evolution, but not very efficient in our case:
Vec<struct>
field, protobuf will serialize metadata of every struct, which is a waste of space.Proposal
We can serialize manifest all by ourselves, a proposed format:
When update incrementally, we can just append new record in the end.
Additional Context
This is where manifest get merged: