brimdata / zed

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.38k stars 67 forks source link

"over" unwraps the top level of unflattened records #5275

Open philrz opened 2 weeks ago

philrz commented 2 weeks ago

tl;dr

A strict reading of the over operator docs indicates the following record should probably be output as is, but instead its top level is broken out into a key and value pairing.

$ echo '{a:{b:{c:1}}}' | zq -z 'over this' -
{key:["a"],value:{b:{c:1}}}

Details

Repro is with Zed commit 21b7168.

Walking through the over docs, here's how the bullets say over <expr> handles different data types:

an array value generates each of its elements

$ zq -version
Version: v1.17.0-55-g21b71680

$ echo '[1,2]' | zq -z 'over this' -
1
2

Checks out. ✅

a map value generates a sequence of records of the form {key:<key>,value:<value>} for each entry in the map

$ echo '|{"APPL":145.03}| |{"GOOG":87.07}|' | zq -z 'over this' -
{key:"APPL",value:145.03}
{key:"GOOG",value:87.07}

Checks out. ✅

all other values generate a single value equal to itself.

Noted! Then it goes on to:

Records can be converted to maps with the flatten function resulting in a map that can be traversed, e.g., if this is a record, it can be traversed with over flatten(this).

$ echo '{a:{b:{c:1}}}' | zq -z 'over flatten(this)' -
{key:["a","b","c"],value:1}

Putting aside the questionable use of the word "map" here, that also checks out. The path through the hierarchical layers of the record are reflected in the array elements ["a","b","c"]. ✅

But what happens if over is attempted on a record without using flatten? As a user, my assumption was that this would probably be subject to the "generate a single value equal to itself", but it turns out that's not the case. Instead it unwraps the top level of the record hierarchy into a key and puts the remaining levels of the record in a value:

$ echo '{a:{b:{c:1}}}' | zq -z 'over this' -
{key:["a"],value:{b:{c:1}}}

My reason for wanting to do this was that I wanted to operate on a hierarchical record in a lateral expression and hence wanted it to pass through over undisturbed. I realized I could get what I want by temporarily treating my record as the single element of an array:

$ echo '{a:{b:{c:1}}}' | zq -z 'over [this]' -
{a:{b:{c:1}}}

But I hesitate at the thought of showing this in docs as if it's the preferred guidance to users. Indeed, when I shared all of this in a group discussion, the reaction was that the unwrapping of the top level was unexpected, so this should be revisited at some point.