brimdata / zed

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.38k stars 67 forks source link

Successful zed client exit code when lake service panicked #5270

Open philrz opened 3 weeks ago

philrz commented 3 weeks ago

tl;dr

With some operators (e.g., join) it's possible for a query to fail due to a panic in the lake service, but the zed client just sees no query response and a successful exit code of 0. This is a problem since the tooling often uses this same query response pattern for success cases, e.g., a search that finds no matches.

Details

Repro is with GA Zed tagged v1.15.0. Repro steps are shown in #5097.

This was a secondary issue originally surfaced in https://github.com/brimdata/zed/issues/5097#issuecomment-2037867789. Since the primary issue #5097 has been addressed, I'm opening this new issue to capture the remaining problem for future consideration.

In a group discussion @nwt mentioned that the problem of the successful exit code is a known problem that shows up in a couple operators (join and sort were the two mentioned, I think) where a goroutine is used and the plumbing is not present to gracefully handle a panic. He explained that it's not a hard thing to fix, but it's kind of messy, which is why it's not been taken up yet.