brimdata / zed

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.34k stars 67 forks source link

Implicit stdin support / Unintuitive behaviour when omitting '-' #5110

Open Dessix opened 2 months ago

Dessix commented 2 months ago

When attempting to onboard from jq, I've had somewhat unintuitive results exploring basic commands:

echo '{a:1,b:2,c:3}' | zq 'yield this' - => {a:1,b:2,c:3}

So far so good, though the docs list the default query when omitting one as being *, so let's try that:

echo '{a:1,b:2,c:3}' | zq '*' - => {a:1,b:2,c:3}

Cool, equivalent behaviour to what I get from having no query:

echo '{a:1,b:2,c:3}' | zq - => {a:1,b:2,c:3}

Except... This option of omitting the query entirely leads into some interesting territory. According to the --help docs:

If the first argument is a path to a valid file rather than a Zed query, then the Zed query is assumed to be "*", i.e., match and output all of the input. If the first argument is both a valid Zed query and an existing file, then the file overrides.

So what happens if we make a request with a valid query and allow standard input to be assumed as the input?

echo '{a:1,b:2,c:3}' | zq 'yield this' => null

Here's our first unexpected behaviour: not passing a query seems to ignore stdin. I thought maybe this was due to - counting as a valid zed query, but the following experiment shows otherwise:

echo '{a:1,b:2,c:3}' | zq '-' - => {a:1,b:2,c:3}

So let's try passing the real "default" query given by the CLI docs:

> echo '{a:1,b:2,c:3}' | zq '*'
zq: could not invoke zq with a single argument because:
 - a file could not be found with the name "*"
 - the argument did not parse as a valid Zed query

So the default zed query of '*' is not a valid zed query?

... Except when you pass stdin explicitly:

echo '{a:1,b:2,c:3}' | zq '*' - => {a:1,b:2,c:3}

As a new user, the issue seems to be that allowing implicit queries is a significant footgun and should be replaced with the more POSIX-friendly implicit stdin input convention. There may be some nuance here that makes shell scripting a lower-priority target than some other workflow that the zq tutorial hasn't introduced, but this is not apparent from day one, and the surprising nature of the CLI behaviour makes me hesitant to adopt this in lieu of JQ.

philrz commented 2 months ago

@Dessix: Thanks for this feedback. Indeed, the core Zed development team has also recognized there's room for improvement here. It helps to have the input of a user that's coming at this with fresh eyes. We'll have another round of discussion and see what we can improve here. I'll circle back with findings.

Dessix commented 1 month ago

Wait, doesn’t that just announce the unintuitive behavior instead of correcting it?

philrz commented 1 month ago

@Dessix: Indeed, a PR merge triggered the early closure of this issue, so I'm reopening. As you noted, #5119 just rationalizes the tool's current behavior so it's no longer contradicting the docs. There's more material improvements we still plan to make that should more directly address the unintuitive behavior. We'll circle back with more detail soon.

Dessix commented 1 month ago

Thank you @philrz ; I'll keep following this to make sure it doesn't get lost beneath GitHub noise then ^^

I believe the Closes 5110 note is why the PR closed it upon merge, incidentally.

philrz commented 1 month ago

@Dessix: Yes, it certainly was the Closes #5110. It would be nice if they had an easier way to link up PRs and Issues in advance without the auto-close trigger upon merge. Instead I've found I have to manually do the linkage after the fact. Anyway, separate topic. 😛

Here's the more detailed response to your concerns. The main improvement we still intend to make is what's described in #5116. That should help with the unexpected behavior you cited in this query:

echo '{a:1,b:2,c:3}' | zq 'yield this' => null

The reason that was happening was due to the "calculator mode" described in the closing of the Usage section of the zq command docs. That's not covered in the zq tutorial which is focused on the more common use cases of streaming/file/URL inputs. But it's helpful to know how you managed to bump into it while following just the tutorial. Looking at it through the eyes of a new user, indeed, we can see how zq's "implicit null input" is not intuitive there and it's unreasonable for us to expect users to read through enough docs to understand that behavior, let alone all like it.

When we add what's described in #5116, then null input behavior will go away and your example above will now return {a:1,b:2,c:3} as you originally expected. Users that still want the "calculator mode" will then have to add the -n flag, but since this is a less common use case it seems like a good trade-off to avoid confusing future users.

Regarding your other finding:

echo '{a:1,b:2,c:3}' | zq '-' - => {a:1,b:2,c:3}

That's expected. The single quotes in '-' get stripped by the shell, then as a developer colleague summarized for me:

zq - - says “read from standard input twice”. Standard input is at EOF after zq has read it once so reading it a second time produces no values.

This is not unique to zq, as you can see the same effect with echo 'hello' | cat '-' - for instance (run it under set -xv to see the effect of the shell stripping the quotes).

Finally, regarding a possible shift to an overall:

friendly implicit stdin input convention

Your point is understood, and we're still discussing that one.