brimdata / super

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.38k stars 64 forks source link

Debug operator taking a flowgraph rather than an expression #5260

Open philrz opened 1 month ago

philrz commented 1 month ago

When reacting to the debug operator in its current form, @mccanne found himself wishing it could take a flowgraph rather than expression to massage its output. I know this idea was kicked around at one point during its design, so this issue just serves a reminder for us to contemplate that in the future.

mattnibs commented 1 month ago

Can we get an example of what this looks like?

philrz commented 1 month ago

@mattnibs: I discussed with @mccanne and here's an example.

Let's start from this baseline:

$ echo '1 2 2 3' | zq -z -
1
2
2
3

He gave the example that he often finds himself wanting to see his data's by grouping as debug output, e.g., what's shown here:

$ echo '1 2 2 3' | zq -z 'by debug_info:=this' -
{debug_info:1}
{debug_info:2}
{debug_info:3}

It does seem like new syntax would be needed to demarcate the inline flowgraph, e.g.,

$ echo '1 2 2 3' | zq -z 'debug (by debug_info:=this)' -

It was also pointed out that by supporting flowgraphs like this the user could easily just invoke expressions via yield if that's all they need, whereas right now where only expressions are supported debug is limited to only working with single values at a time and hence no ability to do summaries like shown here.