AmitKumarDas / fun-with-programming

ABC - Always Be Coding
2 stars 2 forks source link

[rnd] pebbles db #21

Closed AmitKumarDas closed 2 years ago

AmitKumarDas commented 3 years ago
// refer
//
// - PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees
//
// -- Pandian Raju, University of Texas at Austin
// -- Rohan Kadekodi, University of Texas at Austin
// -- Vijay Chidambaram, University of Texas at Austin and VMware Research
// -- Ittai Abraham, VMware Research
// Key-value stores such as LevelDB and RocksDB offer 
// excellent write throughput, but suffer high write 
// amplification.
//
// The write amplification problem is due to the Log-Structured
// Merge Trees data structure that underlies these key-value
// stores. 
//
// To remedy this problem, this paper presents a novel
// data structure that is inspired by Skip Lists, termed 
// Fragmented Log-Structured Merge Trees (FLSM). 
//
// FLSM introduces the notion of guards to organize logs, 
// and avoids rewriting data in the same level. I.e. Write
// amplification is reduced considerably.
// Key-value stores have become a fundamental part of the
// infrastructure for modern systems. Much like how file 
// systems are an integral part of operating systems, distributed
// systems today depend on key-value stores for storage. For
// example, key-value stores are used to store state in graph
// databases, task queues, stream processing engines, 
// application data caching, event tracking systems, NoSQL
// stores, and distributed databases. 
//
// Improving the performance of key-value stores has the 
// potential to impact a large number of widely used data 
// intensive services.
// One fundamental problem that remains is the high write 
// amplification of key-value stores for write-intensive 
// workloads. 
//
// Write amplification is the ratio of total write IO performed 
// by the store to the total user data
//
// High write amplification increases the load on storage 
// devices such as SSDs, which have limited write cycles 
// before the bit error rate becomes unacceptable
// Write amplification also reduces write throughput: in the 
// RocksDB key-value store, it results in write throughput 
// being reduced to 10% of read throughput. Thus, reducing
// write amplification will both lower storage costs and 
// increase write throughput.
// Techniques from prior research tackling write amplification 
// have not been widely adopted since they either require 
// specialized hardware or sacrifice other aspects such as 
// search (range query) performance. Conventional wisdom 
// is that reducing write amplification requires sacrificing either
// write or read throughput. In today’s low-latency, write-intensive
// environments, users are not willing to sacrifice either.
// Key-value stores such as LevelDB and RocksDB are
// built on top of the log-structured merge trees (LSM) data
// structure, and their high write amplification can be traced
// back to the data structure itself. 
//
// LSM stores maintain data in sorted order on storage, 
// enabling efficient querying of data. However, when new data
// is inserted into an LSM store, existing data is rewritten to 
// maintain the sorted order, resulting in large amounts of write
// IO.
AmitKumarDas commented 3 years ago
// This paper presents a novel data structure, the Fragmented
// Log-Structured Merge Trees (FLSM), which combines ideas
// from the Skip List and Log-Structured Merge trees
// data structures along with a novel compaction algorithm.
// FLSM strikes at the root of write amplification by drastically
// reducing (and in many cases, eliminating) data rewrites, 
// instead fragmenting data into smaller chunks that are 
// orga nized using guards on storage. 
//
// Guards allow FLSM to find keys efficiently. Write operations on
// LSM stores are often stalled or blocked while data is compacted 
// (rewritten for better read performance); by drastically reducing write
// IO, FLSM makes compaction significantly faster, thereby 
// increasing write throughput.
// Building a high-performance key-value store on top of
// the FLSM data structure is not without challenges; the 
// design of FLSM trades read performance for write 
// throughput.
//
// This paper presents PebblesDb, a modification of the 
// HyperLevelDB key-value store that achieves the trifecta 
// of low write amplification, high write throughput, and high 
// read throughput. PebblesDb employs a collection of techniques
// such as parallel seeks, aggressive seek-based compaction, and
// sstable-level bloom filters to reduce the overheads inherent
// to the FLSM data structure. Although many of the techniques
// PebblesDb employs are well-known, together with the FLSM
// data structure, they allow PebblesDb to achieve
// excellent performance on both read-dominated and 
// write-dominated workloads.