brimdata / zed

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.37k stars 67 forks source link

Pattern match on elements of named complex fields #983

Open philrz opened 4 years ago

philrz commented 4 years ago

ZQL currently has operators for strict equality and pattern match for fields of "primitive" types. E.g. in Zeek dns events, to find GitHub-related entries in the bstring-type field named query, both these do what I'd expect (all examples are with zq-sample-data).

Strict equality:

_path=dns query=github.map.fastly.net

Pattern match with glob wildcard:

_path=dns query=~*github*

However, if I'm working with an array field like answers, if I want to do the same kind of searches against any element in the array[bstring]-type field answers, ZQL has only syntax for strict membership:

_path=dns github.map.fastly.net in answers

But there's currently no way to check if the glob pattern *github* matches against any of the elements (well, it could be attempted by writing out answers[0]=~*github* or answers[1]=~*github* ..., but you'd need to know the upper bound on how long answers could be, plus this is super clunky.)

A couple approaches that have been considered:

  1. Following from what already exists, one might imagine an in~ operator.
  2. We could back away from having a separate in operator and have = target elements of complex types, perhaps with a change to how the field name is referenced to indicate a test against elements, e.g. answers[]=~*github*
philrz commented 4 years ago

We discussed this one as a group yesterday. There was rough consensus that something like answers[]=~*github* or answers[...]=~*github* might be adequate if we were in a rush to address this immediately. However, it was also noted that there are some higher-level gaps in ZQL expressions right now that we have yet to address. If we approach/address those gaps with a proper language design mindset, the syntax for this specific case would hopefully follow from that. Therefore we're going to continue thinking this one over and perhaps address it as part of a wider ZQL focus that's expected in the near future.