Open philrz opened 3 months ago
In a group discussion @nwt mentioned that the problem of the successful exit code is a known problem that shows up in a couple operators (join
and sort
were the two mentioned, I think) where a goroutine is used and the plumbing is not present to gracefully handle a panic. He explained that it's not a hard thing to fix, but it's kind of messy, which is why it's not been taken up yet. If the primary symptom of this issue is addressed separately, when we close this issue I can open an issue as a reminder to address the more general problem with handling the panics.
Repro is with GA Zed tagged
v1.15.0
and the test data ats3://zed-issues/5097/all.zng.gz
, as it's too large to attach to a GitHub Issue.Save this
program.zed
:Now start a local
zed serve
, then load the test data and run the query as shown below. (For some reason I've not been able to repro the problem withzq
or querying the lake directly at a filesystem location).At this point we find the
zed serve
has crashed with this panic dump:In addition to the panic, I'm somewhat concerned that the
zed query
simply produced no output and had an exit code of0
implying success. I respect that there may be catastrophic failures likekill -9
in which thezed serve
process would not be given a chance to say anything meaningful as it dies, but in this case since it was able to produce the panic dump at the console, is there some way it could have also returned enough info back to the client to imply something went wrong?FYI, this bug does not seem new. I went through several releases all the way back to Zed
v1.10.0
and these repro steps produced a panic in each one.