Support read-triggered compactions in RocksDB

Expected behavior

If an application is incurring a high number of lookups in multiple levels of the SST tree, then it is better to compact that part of the tree so that future lookups do not incur this cost. Compactions triggered by reads would help reduce cpu costs as well as storage iops and storage size cost for these applications.

Actual behavior

In today's behaviour, suppose the same set of key-range reside in every level of a 7 level SST tree and suppose that no new writes to this key range are occuring. No compactions occurs. And every read request has to look through all the 7 levels of the data to return results.

The cost of lookup is reduced somewhat in the presence of bloom filters, but even then, the cost of cpu to find the key in every one of the 7 individual trees is high.

Steps to reproduce the behavior

History: rocksdb has read-triggered compactions in the very early days to account for disk IOs. https://github.com/facebook/rocksdb/commit/c1bb32e1ba9da94d9e40af692f60c2c0420685cd. The thinking at that time was mostly to optimize disk seeks as you can see from this blurb:

      // We arrange to automatically compact this file after
      // a certain number of seeks.  Let's assume:
      //   (1) One seek costs 10ms
      //   (2) Writing or reading 1MB costs 10ms (100MB/s)
      //   (3) A compaction of 1MB does 25MB of IO:
      //         1MB read from this level
      //         10-12MB read from next level (boundaries may be misaligned)
      //         10-12MB written to next level
      // This implies that 25 seeks cost the same as the compaction
      // of 1MB of data.  I.e., one seek costs approximately the
      // same as the compaction of 40KB of data.  We are a little
      // conservative and allow approximately one seek for every 16KB
      // of data before triggering a compaction.
      f->allowed_seeks = (f->file_size / 16384);
      if (f->allowed_seeks < 100) f->allowed_seeks = 100;

The above antiquated logic triggered a compaction if the number of seeks from storage for that particular file exceeded the allowed_seeks number calculated above.

The new logic to trigger compactions-on-read takes should take into account not just storge IOs but also the cpu cost of looking into each individual index of an sst file.

facebook / rocksdb