brimdata / zed

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.38k stars 67 forks source link

Out-of-order output from Zed program with conditional logic #5078

Closed philrz closed 6 months ago

philrz commented 6 months ago

Repro is with Zed commit f1be6a4.

The simplified program in #5076 is based on the program shown here, so perhaps fixing one of these issues will explain or fix the other.

This issue was found when working on a response to a community user's question in a Slack thread. The user's question in their own words:

Is there way to run some sort of type assertion in a function?

op flatten_array(a): (over this | over this | collect(this))

I want to ensure a is an array of arrays

While working my way up to the full response, I got this far, which works as expected on individual inputs:

$ cat maybe_flatten.zed 
op maybe_flatten(a): (
  switch kind(a) (
    case "array" => over a with val=a => (
      and(kind(this) == "array")
      | switch this (
        case true => yield "It's an array of arrays, so I would flatten"
        case false => yield "It's an array containing some non-array values, so I would not flatten"
      )
    )
    default => yield "It's not even an array, so I ain't touchin' that"
  )
)

$ zq -version
Version: v1.14.0-17-gf1be6a4a

$ echo '[[1,2],[3,4]]' | zq -I maybe_flatten.zed 'maybe_flatten(this)' -
"It's an array of arrays, so I would flatten"

$ echo '"hello"' | zq -I maybe_flatten.zed 'maybe_flatten(this)' -
"It's not even an array, so I ain't touchin' that"

$ echo '[1,2,3,4]' | zq -I maybe_flatten.zed 'maybe_flatten(this)' -
"It's an array containing some non-array values, so I would not flatten"

However, now look what happens if I combine all three of those values in a single input file.

$ cat input.zson 
[[1,2],[3,4]]
"hello"
[1,2,3,4]

$ cat input.zson | zq -I maybe_flatten.zed 'maybe_flatten(this)' -
"It's not even an array, so I ain't touchin' that"
"It's an array of arrays, so I would flatten"
"It's an array containing some non-array values, so I would not flatten"

As we can see, the first output is the one associated with the middle "hello" input, which is unexpected. If I was using fork in my flowgraph I'd not have been surprised. However, as a user, I don't see how conditional logic or anything else about this program would cause values to be processed out-of-order.

philrz commented 6 months ago

On further reflection I now recognize that is behavior is divulged in the docs and therefore is not a bug. It would seem I have a blind spot here because while I'm accustomed to not expecting same-order output after things like fork and aggregations, for some reason I didn't expect this from switch. I'll discuss with the team at some point, but closing this issue.