brimdata / zed

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.34k stars 67 forks source link

nameof() should support type values #5067

Closed philrz closed 3 months ago

philrz commented 3 months ago

tl;dr

The following should return "port".

$ echo '80(port=int16)' | zq -z 'yield nameof(<port>)' -
error("missing")

Details

Repro is with Zed commit 38763f8.

A community user recently inquired in a Slack thread:

I am using typeof(this) to get the types of records in my JSON file. Would it be be possible to return just a simplified version of the type (i.e. "http", "files") rather than the entire type itself?

image

Based on the screenshot, it appears the user is following the Zed docs for shaping Zeek NDJSON. Starting from here, there's a couple ways to get the Zeek record type from the record values, e.g., yield _path (since with Zeek data the _path field happens to map 1-to-1 with the record types), or nameof(this), such as in this simplified example based on the Zeek shaping docs:

$ cat shaper.zed 
type port=uint16
type conn_id={orig_h:ip,orig_p:port,resp_h:ip,resp_p:port}
type zenum=string
type dns={_path:string,ts:time,uid:string,id:conn_id,proto:zenum,trans_id:uint64,rtt:duration,query:string,qclass:uint64,qclass_name:string,qtype:uint64,qtype_name:string,rcode:uint64,rcode_name:string,AA:bool,TC:bool,RD:bool,RA:bool,Z:uint64,answers:[string],TTLs:[duration],rejected:bool,_write_ts:time}
shape(this, <dns>)

$ zq -version
Version: v1.14.0-16-g38763f82

$ echo {} | zq -I shaper.zed '| yield nameof(this)' -
"dns"

However, if we start from the user's position of wanting to derive this info from the type value that returrned from the typeof function, that's not currently possible via nameof.

$ echo {} | zq -I shaper.zed '| yield typeof(this)' -
<dns={_path:string,ts:time,uid:string,id:conn_id={orig_h:ip,orig_p:port=uint16,resp_h:ip,resp_p:port},proto:zenum=string,trans_id:uint64,rtt:duration,query:string,qclass:uint64,qclass_name:string,qtype:uint64,qtype_name:string,rcode:uint64,rcode_name:string,AA:bool,TC:bool,RD:bool,RA:bool,Z:uint64,answers:[string],TTLs:[duration],rejected:bool,_write_ts:time}>

$ echo {} | zq -I shaper.zed '| yield nameof(typeof(this))' -
error("missing")

We discussed this one as a group and there was consensus that it would be good to add support for this. In the meantime, in addition to the workarounds shown above that derive the type name from the original value, I also offered the user this hacky workaround that extracts the name from the string representation of the record type definition.

$ echo {} | zq -I shaper.zed '| yield split(string(typeof(this)), "=")[0][1:]' -
"dns"
philrz commented 3 months ago

Verified in Zed commit aca5032.

Both of the repro cases above now return the expected type name rather than error("missing") as it had before.

$ zq -version
Version: v1.14.0-25-gaca50328

$ echo '80(port=int16)' | zq -z 'yield nameof(<port>)' -
"port"

$ echo {} | zq -I shaper.zed '| yield nameof(typeof(this))' -
"dns"

Thanks @mattnibs!