Replace msgpack encoding with ttb

Overview

This is one of two related PRs to replace msgpack with ttb as the default on-disk encoding format for TS. The other is https://github.com/basho/riak_kv/pull/1510.

Msgpack was originally chosen as the encoding format because in early TS development (before the query pipeline was even written), micro-benchmarking of decoding riak objects in C++ for secondary filtering found that msgpack was significantly faster than naive implementation of the ei decoding library for ttb. In the context of the end-to-end query pipeline, however, that decoding in C++ represents less than 10% of the total query latency. Far and away the largest contribution to the latency is converting encoded records back to erlang terms in riak_kv_qry:decode_results() when they are returned to riak from eleveldb. In this context, the binary_to_term BIF is significantly more efficient than our implementation of msgpack decoding -- up to 30% faster for queries of 100K rows, and increasing with the size of the query.

You can find more discussion of the original micro-benchmarks, the query pipeline, and performance testing of this branch at https://github.com/basho/internal_wiki/wiki/Performance-Improvements-for-Riak#ts1.5.

The purpose of these PRs is to improve query performance by replacing msgpack with ttb encoding as the default on-disk format for TS data.

Changes in this repo

Summary The class RangeScanTask defined in workitems.h/cc, is responsible for the streaming fold that executes a TS query. When a secondary filter is passed in with the range scan options, the RangeScanTask constructor parses it into an expression tree that is vetted against each record on disk by decoding the record, populating the tree with values for the fields specified in the filter, and evaluating the tree. Decoding the record is accomplished by an Extractor object which understands how to unpack a riak object and decode the data payload (defined in extractor.h/cc).

In previous versions of TS, only msgpack encoded records were legal, and the inherited class ExtractorMsgpack was used to decode all records. That extractor lives inside the RangeScanTask object and was created when RangeScanTask was instantiated. ExtractorMsgpack uses the class CmpUtil for all msgpack-related decoding operations, via the cmp library.

This PR adds a new inherited extractor, ExtractorErlang that decodes ttb-encoded records. It uses the class EiUtil for all ttb-related decoding operations, via the ei library. EiUtil is a direct analog of CmpUtil and is structured in much the same way.

In this PR, RangeScanTask has been modified no longer to instantiate a single extractor object, but a map of extractors for all valid encoding types. When records are decoded, the encoding byte recorded with the riak object is used as an index into the map, to obtain the correct extractor for the current record. In this way, the on-disk encoding no longer needs to be specified by the caller, and any mix of valid encodings can be processed seamlessly by the fold.

Details c_src/DataType.h TTB encoding uses a special type ("small big") for large-integer encoding that must be handled separately from existing types, and a corresponding enum has been added here.

c_src/EiUtil.h/cc As mentioned in the summary above, EiUtil collects together all operations needed for decoding and formatting TTB-encoded data. It is directly analogous to CmpUtil for msgpack, and is structured accordingly. It is based on the erlang ei binary decoding interface, but one of the main methods used in the fold, EiUtil::skipTerm has been written from scratch, to avoid having to use ei decode calls to advance the index pointer when skipping over fields that are not part of the filter. This replacement is responsible for a 4x speedup, making ExtractorErlang as fast as ExtractorMsgpack (see Performance below)

c_src/eleveldb.cc Because we no longer make assumptions about the on-disk encoding format, but inspect each record as it's encountered, we no longer need the caller to specify an encoding. The unused encoding atoms have therefore been removed, and the option -- if specified -- is simply ignored.

c_src/ErlUtil.cc When constructing error messages, I've changed the default behavior of ErlUtil::formatTerm to attempt to format binaries as human-readable strings, since most 'binaries' used in TS are actually strings to begin with.

c_src/StringBuf.cc A method has been added to return a C++ string version of the internal char* buffer.

c_src/exceptionutils.h Not used in production code, but always useful during development, I'm adding a simple macro that allows dumping messages to disk.

c_src/extractor.cc The new inherited extractor ExtractorErlang has been added to this file, along with the ExtractorMap class for managing a map of extractors. Base-class riakObjectCanBeParsed method has been modified to decode the encoding byte and check it against known valid encodings, and extractRiakObject has been moved into the baseclass. In both ExtractorErlang and ExtractorMsgpack an additional check has been added that stops the decoding when all fields referenced by the filter have been encountered. This further optimizes performance for long records by only decoding the part of the object we need to evaluate the filter.

c_src/filter.h I realized that by allowing NULLs, we had introduced a logical error in evaluation of binary AND and OR expressions. When NULLs were not allowed, we checked if each operand to the binary expressions had a value, and the expression evaluated to false if either didn't. Since has_value() is now used to indicate a valid NULL value, these checks must be removed, and both operands must be evaluated.

c_src/filter_parser.h/cc Because the single extractor has been changed to an ExtractorMap type, every method of filter_parser has been changed to take an ExtractorMap& argument instead. When fields are added by a filter condition, they are now added to every extractor in the map, since we do not know up-front which ones will be needed during the fold.

c_src/workitems.h As discussed in the summary above, RangeScanTask has been modified to manage a map of extractors, rather than a single extractor, and RangeScanTask::DoWork now looks up the right extractor in the map for each disk record on the fly. All code related to parsing the deprecated encoding type from the filter options has been removed.

test/sut.erl Changes to test filtering over interleaved msgpack and ttb-encoded records, and to provide coverage for decoding any integer data type that may be encountered. Previous version of sut.erl tested integer operations, but only for small ints. For both msgpack and ttb, integers can be encoded as a variety of types depending on value, and the current test therefore explicitly tests all regimes to exercise this behavior. Finally, tests for invalid encoding specification have been removed, since specifying an encoding is no longer required, and an invalid encoding therefore no longer generates an error.

Compatibility

Backwards Compatibility Because the new code uses the on-disk encoding flag to determine which extractor to use, data previously written in msgpack format will continue to be correctly interpreted in TS1.5. As noted in https://github.com/basho/riak_kv/pull/1510, the riak_kv code is also agnostic as to the encoding, and will correctly handle records returned to erlang in either format.

Downgrade Compatibility Because the riak_kv code is agnostic (already handles TTB data for KV records), older versions of TS will correctly interpret TS data written with ttb encoding. As noted in https://github.com/basho/riak_kv/pull/1510, however, if a customer downgrades to TS1.4 after writting data with TS1.5, they would need this version of eleveldb or the ttb-encoded records would be skipped. Older versions of riak_kv will work with this version of eleveldb, since the deprecated encoding option is quietly ignored if specified.

Performance

In addition to the functional testing coverage in sut.erl, extensive testing of this branch is detailed at https://github.com/basho/internal_wiki/wiki/Performance-Improvements-for-Riak#ts1.5. I reproduce a summary plot here:

msg_ttb_filter_cmp

The plot shows latencies for TS1.4 (left panel), this branch (middle panel), and the difference in ms (right panel), for a wide range of query sizes using secondary filtering. The plot demonstrates two important points:

on the short-query side (Nrow < 100), where the latency is dominated by the actual eleveldb fold operation, the difference is consistent with zero, demonstrating that the ttb decoding is as efficient as the msgpack decoding in C++
on the large-query side (Nrow > 100), where the latency is dominated by decoding of returned records in erlang, this branch is more than 30% faster than TS1.4 -- for example the reduction in latency for 100K-row queries is 700 ms (out of ~2 seconds).

basho / eleveldb