brimdata / zed

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.34k stars 67 forks source link

Canceled load causes "panic: runtime error: slice bounds out of range" #5150

Open philrz opened 6 days ago

philrz commented 6 days ago

tl;dr

I managed to crash zed serve by starting a zed load and hitting Ctrl-C part way through.

Details

Repro is with GA Zed tagged v1.16.0.

I first start a zed serve, then start a zed load of some data. A couple seconds into the operation, I hit Ctrl-C to stop it. A crash then appears on the zed serve side, then the zed serve exits.

I initially saw this with the large wrccdc data set I was using for benchmarking, but have since repro'ed it also with conn.log.gz from the zed-sample-data.

Client operations:

$ zed -version
Version: v1.16.0

$ zed create -use foo
pool created: foo 2iTi3wOfxnJMkdpemrhH7ZzN4au
Switched to branch "main" on pool "foo"

$ zed load conn.log.gz 
  (0/1) 18.78MB/28.87MB 18.77MB/s 65.06%
Post "http://localhost:9867/pool/2iTi3wOfxnJMkdpemrhH7ZzN4au/branch/main": context canceled

The crash on the zed serve side:

panic: runtime error: slice bounds out of range [:258082541] with capacity 63070208

goroutine 211 [running]:
github.com/brimdata/zed.(*Arena).bytes_(0x2ac8870?, 0xc000494cc8?)
    /Users/phil/work/zed/arena.go:191 +0x3dc
github.com/brimdata/zed.Value.Bytes({0x100000c000526990, 0x4000002200009957})
    /Users/phil/work/zed/value.go:232 +0x20a
github.com/brimdata/zed/runtime/sam/expr.(*DotExpr).Eval(0xc0003f4540, {0x2ac8510?, 0xc0004b80a0?}, {0x2639440?, 0x1?})
    /Users/phil/work/zed/runtime/sam/expr/dot.go:50 +0x21c
github.com/brimdata/zed/runtime/sam/expr.(*missingAsNull).Eval(0xc0004c01e0?, {0x2ac8510?, 0xc0004b80a0?}, {0x100000c000526990, 0x4000002200009957})
    /Users/phil/work/zed/runtime/sam/expr/sort.go:137 +0x2f
github.com/brimdata/zed/runtime/sam/expr.(*Comparator).sortStableIndices(0xc00042d560, {0xc002a22000, 0x9ff56, 0xad400})
    /Users/phil/work/zed/runtime/sam/expr/sort.go:33 +0x243
github.com/brimdata/zed/runtime/sam/expr.(*Comparator).SortStableReader(...)
    /Users/phil/work/zed/runtime/sam/expr/sort.go:250
github.com/brimdata/zed/lake.(*Writer).writeObject.func1()
    /Users/phil/work/zed/lake/writer.go:127 +0x4c
created by github.com/brimdata/zed/lake.(*Writer).writeObject in goroutine 210
    /Users/phil/work/zed/lake/writer.go:126 +0x1af

It doesn't repro 100% of the time, so what I've been doing is Ctrl-C'ing part of the way through, then if it doesn't crash, up-arrow and repeat until it does. It reliably crashes within a few attempts, such as shown in the attached video.

https://github.com/brimdata/zed/assets/5934157/9fa81da7-8855-40ed-965d-69ce7dc11d4d

Note that this time the crash showed "index out of range", though. Not sure if there's multiple problems here, but once we feel like it's addressed, it does feel like a "torture test" might be justified where we do something like this aborted load repeatedly after random wait periods and make sure it never crashes.