brimdata / super

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.39k stars 64 forks source link

update zng type enum #1314

Closed mccanne closed 4 years ago

mccanne commented 4 years ago

Enums are currently represented by strings but they should be coded as ints (i.e., the positional index of the enum string in the typedef) so that end systems using zng with enum can recover the full typedef of the enum. This will also serve as a hint to optimization that an enum column has low cardinality.

mccanne commented 4 years ago

Need to add as an enum typedef when we re-number the typedefs and get rid of enum as a primitive type.

The design will be that enums can be any datatype like rust, but will typically be an uint32.

philrz commented 4 years ago

I've verified current functionality in zq commit 18045ab, though there's more yet to come.

A summary of what's changed & current state:

  1. In the ZNG spec enum was removed from the Primitive Types and instead there's now an enum typedef.
  2. At the moment (zq commit 18045ab), references to enum field names in expressions resolve to their underlying value, such as shown in one of the tests @mccanne added in the linked PR:
$ zq -version
Version: v0.22.0-57-g18045ab

$ cat enum.tzng 
#0:record[e:enum[int32,foo:[1],bar:[2],baz:[4]]]
0:[0;]
0:[1;]
0:[2;]

$ zq -t "put s=e:string, v=e+1" enum.tzng 
#0:record[e:enum[int32,foo:[1],bar:[2],baz:[4]],s:string,v:int64]
0:[0;foo;2;]
0:[1;bar;3;]
0:[2;baz;5;]

Follow-on work that remains: