blaze / datashape

Language defining a data description protocol
BSD 2-Clause "Simplified" License
183 stars 65 forks source link

Declaration/syntax for sum types/union arrays? #237

Open jpivarski opened 5 years ago

jpivarski commented 5 years ago

The flip-side of record types (product types), in which each element has an instance of all fields, are tagged union types (sum types), in which each element has an instance of exactly one type from a list of possible types. For example,

union[int32, string]                                     ->   [1, 2, "three"]
union[int32, var * int32]                                ->   [1, 2, [3, 4], [], 5]
union[tuple[int32, int32], tuple[int32, int32, int32]]   ->   [(1, 2), (3, 4, 5)]

This can be implemented, for instance, with an array of tags (types), optional indexes (offsets), and an array for each type. See, for example, the Arrow implementation.

Are there any plans to support specifications of this kind of data?

nugend commented 4 years ago

Expressing and dealing with variants cogently is an important element of most data processing from messy sources.

I'd like to add that it's important to be able to represent exhaustive and non-exhaustive versions of this as well. The non-exhaustive version is probably sufficient to represent with something like, "I couldn't handle this, so here's the byte sequence that was observed."