Open rockeet opened 11 months ago
currrent
or alive
wal-blob will be different.
3.2 Still not friendly for general-use workloads. To eliminate read-io, the blob written into wal-blob should be inserted into block-cache immediately after writing wal-blob. Which in turn does not reduce memory usage.
3.3 With careful implementation, it should be a great option for write-heavy workloads. Mmap is not a devil, rocksdb also have PosixMmapReadableFile
, wal and L0 files contains hot data which are very likely in page cache, thus read mmap latencies are predictable.
Our simplified implementaion(MemTable ConvertToSST
) has already shown the advantages, it both optimized read and write.
IO write bandwidth is not reduced in
ConvertToSST
if sync theMemTable SST
to disk, even do not sync disk(the files will be compacted and deleted very soon, ext4 filesystem will not auto sync deleted files with dirty pages), process crashes still have no harm.
blob files are static, this is really a small issue, with careful design, it should not be a block issue.
WAL log normally using standary io, thus data will write to page cache first, and the same data are written to MemTable, when MemTable is full, it will be flushed to L0 sst.
There are wastes in such solution:
An optimized solution should be:
fdatasync
will not update file metadatawrite
), write to mmap should be deniedconvert
the WAL file to an SST file:TableFactory
)On our branch, we have implemented an simplified version of this solution:
Convert
MemTable to SST instead ofFlush
MemTable, this gains many improvements but is still not ideal.Reusing WAL page cache is much more complex than our MemTable
ConvertToSST
solution, because this will envolving many rocksdb changes(An challenge is that multiple CFs are sharing same WAL, can reuse blob file manager?).