Support changing dictionaries for the Dictionary compression scheme

It is often the case that, locally, data in a column has limited support, while globally the support is much larger. If we're very lucky, this is captured by a model function, in which case we can use a compression scheme like FOR, FORLIN etc. But in other cases we just have a jumble of values which change over time. Example: the postal codes of students at a school.

For these cases it is useful to extend the Dictionary scheme so that once in a while the Dictionary can be replaced. Thus instead of the following inputs:

name	length	description
compressed_input	n	an index into the dictionary for each compressed element
dictionary_entries	m	an uncompressed value corresponding to each possible dictionary index
n	1	number of compressed (and decompressed) elements
m	1	number of dictionary entries

we'll have

name	length	description
compressed_input	n	an index into the dictionary for each compressed element
dictionary_entries	(varies)	an uncompressed value corresponding to each possible dictionary index
dictionary_application_start	d	the first position in the input (and the output) at which each dictionary comes into effect (the next dictionary start position is where the current dictionary goes out of effect)
dictionary_lengths	d	the number of dictionary entries in each of the dictionaries
n	1	number of compressed (and decompressed) elements
d	1	number of dictionaries

This should be useful for data such as that of the USDT-Ontime benchmark.

eyalroz / libgiddy

Support changing dictionaries for the Dictionary compression scheme #7