Closed philrz closed 4 years ago
Incidentally, while writing this up I noticed that zq
can't actually read back the output from zeek-cut
that's shown above. I think this could be seen as a zeek-cut
bug, so I've opened https://github.com/zeek/zeek/issues/997 to see what the Zeek devs/community make of it. If they make an argument defending the current behavior and don't intend to change it, I'll open a separate zq
issue to see if we can adapt to it.
In an offline discussion, decision was made to modify cut to output any record with a (non-zero) subset of the cut fields. This should address the issue noted above, and was considered to be preferable behavior for regular zql use as well.
Verified in Brim commit e7b06f3
talking to zqd
commit 4e6680f
. Per the attached video, I can now select the same diverse set of columns, Export to ZNG, re-import, and see data where there'd been nothing before.
A couple things that came up for me during verification:
I couldn't bring back the column headers after re-importing the ZNG. I've opened https://github.com/brimsec/brim/issues/873 to pursue that topic separately.
At the command line, I noticed that the new cut
still gives a warning if the requested field(s) return no data at all, but no warning if it's a mix of present/absent data.
$ zq -t "cut foo" weird.log.gz
Cut field foo not present in input
$ zq -t "cut foo,bar" weird.log.gz
Cut fields foo,bar not present together in input
$ zq -t "cut foo,name" weird.log.gz
#0:record[name:bstring]
0:[TCP_ack_underflow_or_misorder;]
0:[truncated_header;]
...
I sense a warning may still be justified in this final case, e.g. to help call the user's attention to a possible typo. I've got a question pending to @nwt on this topic, and other opinions are welcomed.
New processor functionality is needed to allow the Brim "Export to ZNG" option to work as the user would expect when the Columns chooser has been invoked. The need for this was first identified when verifying brim/814.
Consider the following query result in Brim
v0.10.0
after having imported this 2018 wrccdc pcap.Starting from this diverse mix of
http
anddns
events, let's say I click the Columns chooser and check off a mix of fields that appear in both event types (ts
,uid
) and fields that appear in only one event type (uri
from thehttp
events, andquery
from thedns
events).If I click Export, the
results.zng
that's produced contains zero events. This is a side-effect of how the app currently implements the Export function: Thecut
processor is used to askzqd
to trim the query response to contain only the named columns. However, the waycut
is currently implemented, a record is only returned if all the named fields are present. This can be seen by dropping to the command line and executing equivalentzq
by hand on theall.zng
that was generated by the import.So the high-level requirement is that
zqd
needs to provide some variation on the currentcut
behavior that returns ZNG that could be re-imported to produce the same results the user was looking at when they clicked Export. It seems like this could be done via a new option oncut
(which makes sense to me, but I'm open to suggestion) or a whole new processor. I'm not advocating for a change in the default behavior ofcut
(though I'd not necessarily rule it out).While writing this up, I'm reminded that FWIW,
zeek-cut
is more forgiving. Turning my query result into Zeek TSV and doing the equivalent cut operation, I get all the diversity:This speaks to another implementation detail I've pre-discussed with @henridf: How would would the ZNG handle the fields not present? For instance, would the descriptor for the
dns
event contain auri
field that gets populated with anull
value, and likewise would the descriptor for thehttp
event contain aquery
field that's populated withnull
? If the implementation can handle it, I would argue the answer to this is "no". Like we just saw withzeek-cut
it would seem ideal if this new cutting functionality could effectively trim the descriptor of each record to cover only the set of named fields that it contains.