brimdata / super

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.39k stars 64 forks source link

typed ndjson: panic when scalar in place of array #779

Closed henridf closed 4 years ago

henridf commented 4 years ago

The typed ndjson reader panics upon reading a record where a field that is expected to contain an array instead contains a scalar.

$ echo '{"_path":"panic","f":"foo"}' | zq -t -j panic.json  -
panic: zeek.TypeArray.Parse shouldn't be called

goroutine 1 [running]:
github.com/brimsec/zq/zng.(*TypeArray).Parse(0xc0002fc4a0, 0xc00037a016, 0x3, 0xffea, 0xc0002ab248, 0x1198398, 0x1813220, 0x0, 0x0)
    /Users/henridf/work/brim/zq/zng/array.go:40 +0x39
github.com/brimsec/zq/zio/ndjsonio.parseSimpleType(0xc00037a016, 0x3, 0xffea, 0x1ba85c0, 0xc0002fc4a0, 0x8, 0xc000302338, 0x6, 0x8, 0x0)
    /Users/henridf/work/brim/zq/zio/ndjsonio/typeparser.go:355 +0x1ab
github.com/brimsec/zq/zio/ndjsonio.appendRecordFromViews.func1(0xc00037a016, 0x3, 0xffea, 0x1, 0x1ba85c0, 0xc0002fc4a0, 0x0, 0x0)
    /Users/henridf/work/brim/zq/zio/ndjsonio/typeparser.go:153 +0x43b
github.com/brimsec/zq/zio/ndjsonio.appendRecordFromViews(0xc0002ab480, 0xc00030e700, 0x2, 0x2, 0xc00030e7c0, 0x1, 0x1, 0xc00037a000, 0x1b, 0xc000039b79, ...)
    /Users/henridf/work/brim/zq/zio/ndjsonio/typeparser.go:177 +0x39b
github.com/brimsec/zq/zio/ndjsonio.(*typeInfo).newRawFromJSON(0xc00030e780, 0xc00037a000, 0x1b, 0x10000, 0x10000, 0xc0003065d0, 0xc000322000, 0x1, 0x4, 0x0)
    /Users/henridf/work/brim/zq/zio/ndjsonio/typeparser.go:206 +0x137
github.com/brimsec/zq/zio/ndjsonio.(*typeParser).parseObject(0xc000318140, 0xc00037a000, 0x1b, 0x10000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3, ...)
    /Users/henridf/work/brim/zq/zio/ndjsonio/typeparser.go:284 +0x23b
github.com/brimsec/zq/zio/ndjsonio.(*Reader).Parse(0xc000306540, 0xc00037a000, 0x1b, 0x10000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /Users/henridf/work/brim/zq/zio/ndjsonio/reader.go:123 +0xe7
github.com/brimsec/zq/zio/ndjsonio.(*Reader).Read(0xc000306540, 0xc000306540, 0xc0002ddb90, 0xc000322000)
    /Users/henridf/work/brim/zq/zio/ndjsonio/reader.go:139 +0xac
github.com/brimsec/zq/zio/detector.match(0x1b93e00, 0xc000306540, 0x1962be4, 0x6, 0x199c38a, 0x59)
    /Users/henridf/work/brim/zq/zio/detector/reader.go:74 +0x38
github.com/brimsec/zq/zio/detector.NewReaderWithConfig(0x1b93d80, 0xc000332de0, 0xc00030a000, 0x7ffeefbffb1c, 0x1, 0x1960719, 0x4, 0x1, 0xc0002ec2a0, 0x199c38a, ...)
    /Users/henridf/work/brim/zq/zio/detector/reader.go:49 +0x5db
github.com/brimsec/zq/zio/detector.OpenFromNamedReadCloser(0xc00030a000, 0x1b9d5c0, 0xc000010010, 0x7ffeefbffb1c, 0x1, 0x1960719, 0x4, 0x1, 0xc0002ec2a0, 0x199c38a, ...)
    /Users/henridf/work/brim/zq/zio/detector/file.go:131 +0xfd
github.com/brimsec/zq/zio/detector.OpenFile(0xc00030a000, 0x7ffeefbffb1c, 0x1, 0x1960719, 0x4, 0x1, 0xc0002ec2a0, 0x199c38a, 0x59, 0x0, ...)
    /Users/henridf/work/brim/zq/zio/detector/file.go:68 +0x11a
main.(*Command).inputReaders(0xc00030c000, 0xc000032090, 0x1, 0x1, 0x0, 0x0, 0xa60000000192ad80, 0xc0002abc10, 0xa6e5108f0a6a7650)
    /Users/henridf/work/brim/zq/cmd/zq/zq.go:270 +0x166
main.(*Command).Run(0xc00030c000, 0xc000032090, 0x1, 0x1, 0x0, 0x0)
    /Users/henridf/work/brim/zq/cmd/zq/zq.go:218 +0x2cf
github.com/mccanne/charm.(*instance).run(0xc0002fc0c0, 0xc000032060, 0x4, 0x4, 0x0, 0x0)
    /Users/henridf/go/pkg/mod/github.com/mccanne/charm@v0.0.3-0.20191224190439-b05e1b7b1be3/instance.go:53 +0x28c
github.com/mccanne/charm.(*Spec).ExecRoot(0x215b0a0, 0xc000032060, 0x4, 0x4, 0xc0002abf50, 0x100742f, 0xc0000a6058, 0x0)
    /Users/henridf/go/pkg/mod/github.com/mccanne/charm@v0.0.3-0.20191224190439-b05e1b7b1be3/charm.go:77 +0x96
main.main()
    /Users/henridf/work/brim/zq/cmd/zq/main.go:9 +0x78
$ cat panic.json 
{
  "descriptors": {
    "panic_log": [
      {
        "name": "_path",
        "type": "string"
      },
      {
        "name": "f",
        "type": "array[int64]"
      }
    ]
  },
  "rules": [
    {
      "descriptor": "panic_log",
      "name": "_path",
      "value": "panic"
    }
  ]
}
philrz commented 4 years ago

Verified in zq commit a14daef. The original repro steps now generate an appropriate error message rather than a crash.

$ echo '{"_path":"panic","f":"foo"}' | zq -t -j panic.json  -
/dev/stdin: format detection error
    tzng: line 1: strconv.ParseUint: parsing "{\"_path\"": invalid syntax
    zeek: line 1: bad types/fields definition in zeek header
    ndjson: line 1: field "f" (type array[int64]): expected container type, got primitive
    zjson: undefined type ID: 0
    zng: malformed zng record
    parquet: auto-detection not supported

Thanks @henridf!