facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
http://rocksdb.org
GNU General Public License v2.0
28.65k stars 6.33k forks source link

Towards a more Configurable, Customizable, and Extensible RocksDB #6533

Open mrambacher opened 4 years ago

mrambacher commented 4 years ago

This document outlines a mechanism for providing configuration-time support for extending and configuring RocksDB via the Options files. A means of adding functionality to RocksDB via customizable extensions is described and provided.

RocksDB provides a rich set of options and configurations for the programmer, allowing RocksDB to be used in many different environments. Programmers can experiment with different implementations and settings for various RocksDB features and see what works best for their environment.

Wouldn’t it be nice to be able to try these configurations out at run-time without rebuilding a new executable? What about being able to use a HDFS via a configuration-time flag or try out different FilterPolicy or Cache implementations? What about being able to try out different database implementations (like Backupable or TTL) without recompiling?

======

RocksDB uses an Options file to store its configuration settings. The Options file outlines how RocksDB was configured at the time and can be used to rehydrate many of the settings within a RocksDB instance on restart.

Unfortunately, not all of the RocksDB properties can be configured via the Options file. For some objects, the Options file may reflect the name of the class being used but not any configuration properties. Furthermore, when these classes are loaded from the Options file, the loader does not know how to create or rehydrate many of those classes, meaning that these objects must be configured in code in order to recreate the original settings.

==========

Historically, RocksDB objects have been configured by creating a string to ObjectTypeInfo map where the string represents the name of the property to configure and the ObjectTypeInfo describes the type of the property being configured. APIs were then added to get/set the properties and compare the options (GetDBOptionsFromMap, GetColumnFamilyOptionsFromMap, etc). Each object class corresponded to a single map and multiple APIs. These APIs would then need interjected in proper places to convert the objects to/from strings as appropriate. Most of these APIs were very similar in nature but not quite identical. Additionally, each additional base type required updating an enumerated type and adding code (to three places) to support that type. Finally, some types could not be handled by the base map code and were special cased in the APIs.

Presumably because of the number of steps involved and potential complexity, there are a large number of objects that cannot be configured via the Options file. To address some of the deficiencies in the RocksDB options management, the OptionTypeInfo class was expanded and a Configurable base class was developed.

======

The OptionTypeInfo class has been improved. The goal for this improvement is to reduce the amount of code required to add new configurable objects. These changes minimize or ideally eliminate the special cases for objects that cannot be handled via the ObjectTypeInfo map. The improvement also eliminates the need for many of the option types.

The second improvement is the addition of a Configurable class. This class has been added to bring the implementations and APIs together. The Configurable base class defines a means of registering a data object with its options map. With this registration in place, the Configurable base class can standardize the implementation of the basic options methods (converting to/from string and comparison of objects). The Configurable class knows how to take maps of name-value pairs read from an Options file and update the corresponding values in the object.

Configurable classes provide the following functionality:

Initializing object variables from input name-value property strings Converting object variables into name-value properties Comparing two Configurable objects for equality The implementer of a Configurable class must only register the mappings of properties to object variables. The Configurable base class provides the rest of the functionality.

Additionally, the Configurable class provides interfaces to Prepare and Verify that the settings of an object are correct. The “Prepare” method can be used by the class to perform any initialization based upon its settings. The “Verify” method can be used to validate that the settings are a valid configuration. It is up to the implementer of the derived class to define what these functions mean for a given class.

The third improvement is the addition of the Customizable class. Many classes provide alternative implementations in order to provide different functionality. For example, there are three TableFactory implementations: Plain, BlockBased, and Cuckoo in the system. Users can provide their own implementations of classes such as MergeOperator or Comparator. The Customizable class standardizes the method of instantiating these alternative implementations. The Customizable class builds on the Configurable class hierarchy and provides factory and extension capabilities. A Customizable class defines its type (“TableFactory”) and has a factory method (e.g., TableFactory::CreateFromString). Each alternative implementation of this class defines a unique Name for the implementation (e.g. “PlainTableFactory”, “BlockBasedTableFactory”). When the factory method is called, the factory will look for the named implementation and return a new instance of the requested class. If further properties are passed in to the factory method the new object will be configured using those properties. The fourth improvement adds a mechanism for an object to be registered with the ObjectRegistry. This class enables object factories to be registered without modifying the source code. Instead, the developer provides a dynamic library and associated method so that their object can be registered with the object registry. This registration will cause RocksDB to load the library into the RocksDB process, locate the named method and invoke the method with the supplied argument. The invoked function can then register new objects and types.

Finally, the ObjectRegistry supports writing its contents to the Options file. Upon reading the Options file, the ObjectRegistry can reload the registered objects from the information supplied in the Options file.

========

Using this methodology, any tool that uses the RocksDB Options file for initialization can use these extension packages with no code changes. The RocksDB-Cloud environment is an example. In LDB one can set an option in LDB’s options file to dynamically load the RocksDB library at initialization time and rerun with an additional option set to dynamically load the RocksDB Cloud environment. A PR is pending for MyRocks that will enable configurable items in the MySQL cnf file to be passed through to MyRocks and transformed into RocksDB options. Once this PR makes it upstream one can deploy MySQL with the MyRocks storage engine configured to use RocksDB with the default Env and then subsequently with the RocksDB-Cloud Env.

More broadly, these five improvements collectively open the possibility for: an existing product to be modified/extended via configuration. For example, different implementations and configurations can be experimented with without recompiling. new capabilities to be added to RocksDB without it becoming part of the base source code. There are several examples of such functionality built into RocksDB -- from the HDFS and RADOS environments to the TPP cache to the extensions for Cassandra. When using shared libraries, these features can be added to existing code without recompilation. alternative implementations to be added to other languages (such as Java) without an explosion in the number of APIs and cross-language points. For example, by using the CreateFromString method of the Env class, eight JNI entry points could be replaced with two entry points while supporting more potential Env types. Other types may have comparable API savings and increased implementations. improved testing of alternatives. Using this methodology, it is now possible to plug in different configurations into the tests without developing more testing infrastructure. The base tests can point at an alternative configuration via configuration strings or files.

========

What is left to do? Using this functionality, it would be possible to develop an “extension” package for RocksDB. Inside of this package/directory developers could contribute their own extensions to RocksDB that others could try. These experimental packages could be tested by more of the community before they were accepted (or not) into the base source package.

To add new implementations via configuration, dynamic libraries must be supported. However, most RocksDB installations appear to use static libraries. The use of static libraries will result in the core RocksDB code being in the executable twice (once via the static and once via the dynamic library). This duplicity can cause issues during the execution of the program (especially during shutdown). Further investigation and fixes to RocksDB may alleviate some of these issues.

Some Configurable classes are meant to share instances between objects. For example, there may be multiple objects configured to use the same Logger or Cache. A solution to share these objects has not yet been developed. One idea is to use the ObjectRegistry to store and register these “shared” objects for later discovery.

Not all of the classes that have configurations have been converted to Configurable ones. Comparably, not all classes that should be Customizable have been implemented as such.

There should be a new DBPlugin Customizable class. This class would allow alternative variants of a DB (e.g. StackableDB) to be created and registered without calling the database-specific Open functions. For example, it should be possible to achieve the same functionality as the static DbTtlOpen method by registering a TtlDbPlugin and calling the standard DB::Open method. This functionality is required to support the configuration-time decisions on which database implementation should be used

Even when all of the above work is complete, there will still be some Options that cannot be easily serialized and reconstructed. One specific example would be callback functions. If serializing and restoring these types are important, a solution to doing so will need to be developed.

mrambacher commented 4 years ago

This issue is being addressed in a series of PRs: