Closed philrz closed 4 years ago
Verified in zq
commit b2c8356
.
Using the same example that failed before, we see the sub-fields that make up the id
record now all appear together anchored alongside the leftmost appearance of an id.*
field.
$ zq -t -i zeek sample.tsv
#zenum=string
#0:record[_path:string,ts:time,uid:bstring,id:record[orig_h:ip,orig_p:port,resp_h:ip,resp_p:port,orig_h_name:record[src:bstring,vals:set[bstring]],resp_h_name:record[src:bstring,vals:set[bstring]]],proto:zenum,service:bstring,duration:duration,orig_bytes:uint64,resp_bytes:uint64,conn_state:bstring,local_orig:bool,local_resp:bool,missed_bytes:uint64,history:bstring,orig_pkts:uint64,orig_ip_bytes:uint64,resp_pkts:uint64,resp_ip_bytes:uint64,tunnel_parents:set[bstring],orig_cc:bstring,resp_cc:bstring]
0:[conn;1598243094.015046;CWjxkd3jpmxuvN21uj;[10.124.2.117;61927;10.70.70.70;8080;[-;-;][SSL_SNI;[www.pacast.com;c.clicktale.net;www.gstatic.com;www.youtube.com;oneclient.sfx.ms;eb2.3lift.com:443;tapestry.tapad.com;www.google.com:443;bats.video.yahoo.com;oneclient.sfx.ms:443;collect.tealiumiq.com;js-sec.indexww.com:443;ctldl.windowsupdate.com;pr-bh.ybp.yahoo.com:443;bats.video.yahoo.com:443;clientservices.googleapis.com;13-237-209-96.expertcity.com:443;clientservices.googleapis.com:443;]]]tcp;-;0.002716;0;77;SF;F;F;0;FdfR;3;120;2;157;-;-;-;]
This also proves handy (and readable) in NDJSON.
$ zq -f ndjson -i zeek "cut id" sample.tsv | jq .
{
"id": {
"orig_h": "10.124.2.117",
"orig_h_name": {
"src": null,
"vals": null
},
"orig_p": 61927,
"resp_h": "10.70.70.70",
"resp_h_name": {
"src": "SSL_SNI",
"vals": [
"www.pacast.com",
"c.clicktale.net",
"www.gstatic.com",
"www.youtube.com",
"oneclient.sfx.ms",
"eb2.3lift.com:443",
"tapestry.tapad.com",
"www.google.com:443",
"bats.video.yahoo.com",
"oneclient.sfx.ms:443",
"collect.tealiumiq.com",
"js-sec.indexww.com:443",
"ctldl.windowsupdate.com",
"pr-bh.ybp.yahoo.com:443",
"bats.video.yahoo.com:443",
"clientservices.googleapis.com",
"13-237-209-96.expertcity.com:443",
"clientservices.googleapis.com:443"
]
},
"resp_p": 8080
}
}
Thanks @nwt!
A community user was attempting to read a Zeek TSV log that had been generated by a Corelight Sensor. Sample contents:
As of
zq
commit16d510a
, this triggers the following error:The root cause involves zq's internal reconstruction of the Zeek
record
data type into the hierarchical format as it existed when the data originated inside Zeek. One of the assumptions built into the current functionality is that all sub-fields of a record are expected to be adjacent, such as is traditionally seen withid.orig_h
,d.orig_p
,id.resp_h
, andid.resp_p
. The zq reader is therefore not expecting to see the fields starting rightward fromid.orig_h_name.src
separated by columns representing other fields.In an internal discussion, the Brim development team established it would be feasible to enhance the reader to be accepting of non-adjacent record fields. However, there's not a way to easily accomplish this while still preserving the column order with "split" record fields in the original data. In other words, if this Zeek TSV were read via the proposed enhanced zq, turned internally into ZNG, and then written it back out again as Zeek TSV, the initial/final Zeek TSV representations will differ. The likely implementation will have the "separated" record fields made adjacent to the record fields seen "leftmost" in the column order. Therefore the above sample when written back out from zq as Zeek TSV may look like:
Incidentally, the way I was able to create that output is via:
And that representation is happily accepted by
zq
, with the re-assembly of the hierarchicalrecord
enabling shorthand like: