brimdata / zed

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.38k stars 67 forks source link

Something like jq's "debug" #5181

Closed philrz closed 1 month ago

philrz commented 2 months ago

tl;dr

In a recent community Slack thread, a user asked of zq:

is there a way to do jq’s debug?

Zed doesn't currently have a 1-to-1 equivalent to this, but a lot of the building blocks are there. To benefit users that come to Zed from tools like jq, it does seem like a worthwhile exercise to consider what it might take to provide something similar and rationalize where we think it's appropriate to take a different approach.

Details

At the time this issue is being filed, Zed is at commit 6e61673.

The jq docs don't actually show an example of their most simple debug usage with no parameters, but we can trim their more advanced example to:

$ jq -n '1 as $x | 2 | debug | (.+1)'
["DEBUG:",2]
3

Here's something similar in zq.

$ cat debug.zed 
op debug(): (
  fork (
    => yield error(["DEBUG:", this]) | yield {k:0, v:this}
    => pass | yield {k:1, v:this}
  )
  | sort k
  | yield v
)

$ zq -version
Version: v1.16.0-15-g6e61673c

$ zq -I debug.zed 'const x=1 yield 2 | debug() | yield this+1'
error(["DEBUG:",2])(error([(int64,string)]))
3

jq takes the approach of sending the error outputs to stderr, which means there's ways at the shell to redirect or filter them out. zq doesn't currently take this approach, which is why I chose to wrap the debug messages in Zed error values since there's filtering available within the language itself, e.g., making the following the equivalent of tacking 2> /dev/null onto the end of the jq command line above.

$ zq -I debug.zed 'const x=1 yield 2 | debug() | yield this+1 | !is_error(this)'
3

Of course, this begs the question of whether this close coupling between the tool and the shell is ideal. A proficient jq user that later comes to zq might indeed have the muscle memory that would make this seem appealing. However, even if this were enabled specifically with zq to benefit those shell-centric jq-like users, what would happen in other places where the Zed language can be used? For instance, over the Zed lake API there's no concept of "stderr", so where would the debug output go in that case? In an API-dependent app like Zui, we could perhaps imagine some likely UI, e.g., perhaps a separate window or panel could separate the debug output from query output, or perhaps it could be interleaved with the query output in a way that makes stand out visually and easy to filter. So this would all require design insight.

Moving on to the more advanced example as it appears in the jq docs:

$ jq -n '1 as $x | 2 | debug("Entering function foo with $x == \($x)", .) | (.+1)'
["DEBUG:","Entering function foo with $x == 1"]
["DEBUG:",2]
3

I have what I believe would be a working enhancement that once again approximates what jq is doing, taking the approach of passing the multiple "message" parameters as elements of an array. However, I hit what appears to be a bug #5180.

$ cat debug-msgs.zed 
op debug(msgs): (
  fork (
    => over msgs => (
      yield error(["DEBUG:", this])
      | yield {k:count(), v:this}
    )
    => pass | yield {k:Inf, v:this}
  )
  | sort k
  | yield v
)

$ zq -I debug-msgs.zed 'const x=1 yield 2 | debug([f"Entering function foo with $x == {x}", this]) | yield this+1'
error(["DEBUG:","Entering function foo with $x == 1"])
error(["DEBUG:",error("missing")])(error([(string,error(string))]))
3

I'll plan to pick up this effort once that's addressed.

philrz commented 2 months ago

In a recent chat @mccanne refreshed me on his last design thoughts on this topic. We have this concept of "output channels" which are currently numbered but we they should probably be named since that would allow a debug operator to just be something that sends output to a named channel. To mimic the jq behavior for users that want that, zq could just send the output from such named channels to stderr by default. How the same would be surfaced in the API and Zui could be a follow-on design topic.

philrz commented 1 month ago

Verified in Zed commit 97e0d9c.

Revisiting the jq example studied above, here's how to achieve the equivalent using Zed's new debug operator added via #5196. Splitting hairs between the two implementations, Zed's takes just a single argument, so one way to see the multi-part output shown in the jq doc would be to pass a single array to Zed debug.

$ zq -version
Version: v1.17.0-21-g97e0d9c1

$ zq 'const x=1 yield 2 | debug [f"DEBUG: Entering function foo with x == {x}", f"DEBUG: {this}"] | yield this+1' 2> debug.out
3

$ cat debug.out
["DEBUG: Entering function foo with x == 1","DEBUG: 2"]

Another approach would be to invoke debug multiple times, which could indeed prove handy for other use cases, such as to watch the transform of data as it moves through multiple steps in a pipeline.

$ zq 'const x=1 yield 2 | debug f"DEBUG: Entering function foo with x == {x}" | debug f"DEBUG: {this}" | yield this+1' 2> debug.out
3

$ cat debug.out 
"DEBUG: Entering function foo with x == 1"
"DEBUG: 2"

However, this second example reveals a subtlety of the Zed implementation: If invoked multiple times within the same program like this, the relative order of the debug outputs may not match the relative order that the debug operators appear on a pipeline, e.g., on a re-run:

$ zq 'const x=1 yield 2 | debug f"DEBUG: Entering function foo with x == {x}" | debug f"DEBUG: {this}" | yield this+1' 2> debug.out
3

$ cat debug.out 
"DEBUG: 2"
"DEBUG: Entering function foo with x == 1"

I've opened #5225 to disclose this in the docs.

For now, debug is only useful with the Zed CLI tools for sending the output to stderr, but we have plans to make the output prominent in Zui as tracked in https://github.com/brimdata/zui/issues/3138.

Thanks @mattnibs!