Closed nweston closed 5 years ago
(Initial thoughts, better review to come.)
Table
is a bad name (tables are things with rows and columns, this has neither). I don't think AuxData
is the best name, but it at least doesn't come with the baggage that "table" does. I'd be fine with AuxiliaryData
, AncillaryData
, while DataOnTheDownLow
might be a bad choice. I'm betting we can come up with more names if we try. :-)std::get()
(it's UB to try that, see [namespace.std]p1. I think we will be okay adding a template specialization, so long as the implementation adheres to the specification of std::get()
and at least one of the types involved in the specialization is a type we define. Not certain if that's our circumstance though (I've not gotten that far).get<T>()
return a std::optional<T>
and the user can know whether it succeeded or not. If we want some sort of further information (such as errors), we could implement something akin to ErrorOr<>
from LLVM.
There are a few specific things I'd like feedback on:
Naming We talked about changing to
AuxData
but it doesn't exactly roll off the tongue. I've stuck withTable
for now. I think it describes the primary intended use pretty well, even if it doesn't capture everything the class can do.Interface I've provided a variant-like interface with an assignment operator,
get
, andget_if
, which minimized the changes to existing code. I couldn't find any indication of whether one is supposed to overloadstd::get
or implement it in your own namespace. Any opinions? I've gone with the latter for now but it should be an easy change.Error handling
get()
does its type check withExpects
. If you try to read the wrong type it causes and assertion failure/crash. This doesn't seem great but I'm not sure what other option we have without exceptions. I also providedget_if()
which returnsnullptr
on a type mismatch, but it's a little more cumbersome to use.Serialization format Last week we talked about adding more structure to the protobuf message for Tables, with the idea that this would simplify cross-language access. But I don't think this is actually possible because protobuf doesn't have templates/type variables. So basically it can't express the fact that sequence can contain many different types, but any given sequence has to be homogeneous. Instead you end up with every element of a sequence possibly having a different type. Not only is this inefficient, it would allow clients in other languages to create tables that the C++ API can't read.
Given that limitation, I decided to stick with the current approach of serializing the contents to a byte array and having protobuf store that. Clients in other languages will need some code to unpack the byte arrays, but the format is pretty simple and this shouldn't be a huge burden. We could provide example code in one or two common languages when we get to that point.
Overview
The interface here is pretty similar to the old tables:
The big difference is with nested containers. Previously if you wanted, say, a map of ints to strings, you had to use
std::map<int64_t, table::ValueType>
, whereValueType
is itself a variant. Iterating over that map involved multiple levels ofget
calls:Now you can store a
map<int, string>
(or any other arbitrary nesting of containers) directly, and you only need a singleget
call to access it.The other notable improvements are support for all integral types, and tuples.
Serialization
There are two pieces to this. First, we generate a portable type ID. Containers are generalized (e.g.
std::vector
becomessequence
), and integers are given a name which reflects their exact size (e.g.int
becomesint32_t
or whatever is appropriate). So you end up with a string like "sequence<map<int32_t, string>>".Second, the structure is flattened into a byte array. For primitive types, this means swapping to little-endian order and copying the bytes directly. For containers, this means writing out the number of elements (as a
uint64_t
), then writing out all the elements in a contiguous block.When deserializing, we initially just load the byte array. Unpacking is deferred until someone calls
get
. At that point, we check that the type-id of the requested type matches what we serialized. For example, if you serialize a 32-bitint
on one platform and try toget
it as a 64-bitint
on another platform, you'll get an error at runtime.If the types match, we then unpack the byte array, which is a straightforward reversal of the packing process.