cube2222 / octosql

OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.
Mozilla Public License 2.0
4.78k stars 202 forks source link

Panic in parquet query #327

Open yusufozturk opened 1 year ago

yusufozturk commented 1 year ago

I was using ClickBench queries to test octosql performance.

I get following error for query number 23:

panic: runtime error: index out of range [1986948931] with length 4192

Query:

octosql.exe "SELECT * FROM hits.parquet WHERE URL LIKE '%google%' ORDER BY EventTime LIMIT 10"

Full error:

goroutine 1 [running]:
github.com/segmentio/parquet-go.(*byteArrayDictionary).Index(0x58dd85?, 0x11c340?)
        /home/runner/go/pkg/mod/github.com/cube2222/parquet-go@v0.0.0-20220512155810-0e06eee50261/dictionary.go:86 +0xed
github.com/segmentio/parquet-go.(*indexedPageReader).ReadValues(0xc02da4df20, {0xc002901fe0, 0xaa, 0xb8045a?})
        /home/runner/go/pkg/mod/github.com/cube2222/parquet-go@v0.0.0-20220512155810-0e06eee50261/dictionary.go:338 +0x89
github.com/segmentio/parquet-go.(*columnChunkReader).readValuesFromCurrentPage(0xc0028b80c0)
        /home/runner/go/pkg/mod/github.com/cube2222/parquet-go@v0.0.0-20220512155810-0e06eee50261/column_chunk.go:135 +0x90
github.com/segmentio/parquet-go.(*columnChunkReader).readValues(0xfcf180?)
        /home/runner/go/pkg/mod/github.com/cube2222/parquet-go@v0.0.0-20220512155810-0e06eee50261/column_chunk.go:115 +0x29
github.com/segmentio/parquet-go.columnReadRowFuncOfLeaf.func1({0xc02e69b710?, 0x2, 0x2}, 0x0?, {0xc0028b8000, 0x0?, 0x0?})
        /home/runner/go/pkg/mod/github.com/cube2222/parquet-go@v0.0.0-20220512155810-0e06eee50261/column_chunk.go:326 +0xc5
github.com/segmentio/parquet-go.makeColumnReadRowFunc.func1({0x0?, 0x3?, 0x0?}, 0x0?, {0xc0028b8000, 0x69, 0x69})
        /home/runner/go/pkg/mod/github.com/cube2222/parquet-go@v0.0.0-20220512155810-0e06eee50261/schema.go:163 +0xa3
github.com/segmentio/parquet-go.(*rowGroupRowReader).ReadRow(0x0?, {0x0?, 0x0, 0x0?})
        /home/runner/go/pkg/mod/github.com/cube2222/parquet-go@v0.0.0-20220512155810-0e06eee50261/row_group.go:306 +0xb7
github.com/segmentio/parquet-go.(*reader).ReadRow(0xc00165c370, {0x0?, 0x0, 0x0?})
        /home/runner/go/pkg/mod/github.com/cube2222/parquet-go@v0.0.0-20220512155810-0e06eee50261/reader.go:276 +0xb1
github.com/segmentio/parquet-go.(*Reader).ReadRow(0xc00165c360, {0x0, 0x0, 0x0})
        /home/runner/go/pkg/mod/github.com/cube2222/parquet-go@v0.0.0-20220512155810-0e06eee50261/reader.go:221 +0x65
github.com/cube2222/octosql/datasources/parquet.(*DatasourceExecuting).Run(0xc00142f140, {{0x115d8f0?, 0xc000384e40?}, 0x0?}, 0xc00142f1d0, 0x0?)
        /home/runner/work/octosql/octosql/datasources/parquet/execution.go:47 +0x512
github.com/cube2222/octosql/execution/nodes.(*Filter).Run(0xc0003dac20, {{0x115d8f0?, 0xc000384e40?}, 0x0?}, 0xc0000e15e0, 0x1149976?)
        /home/runner/work/octosql/octosql/execution/nodes/filter.go:23 +0xfc
github.com/cube2222/octosql/outputs/batch.(*OutputPrinter).Run(0xc00165c2d0, {{0x115d8f0?, 0xc000384e40?}, 0x0?})
        /home/runner/work/octosql/octosql/outputs/batch/live_output.go:116 +0x4b9
github.com/cube2222/octosql/cmd.glob..func4(0x1799140, {0xc00037ead0, 0x1, 0x1?})
        /home/runner/work/octosql/octosql/cmd/root.go:458 +0x3b34
github.com/spf13/cobra.(*Command).execute(0x1799140, {0xc00009e3b0, 0x1, 0x1})
        /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:856 +0x67c
github.com/spf13/cobra.(*Command).ExecuteC(0x1799140)
        /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:974 +0x3b4
github.com/spf13/cobra.(*Command).Execute(...)
        /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:902
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:895
github.com/cube2222/octosql/cmd.Execute({0x115d848?, 0xc00022edc0?})
        /home/runner/work/octosql/octosql/cmd/root.go:471 +0x53
main.main()
        /home/runner/work/octosql/octosql/main.go:24 +0xe8

Same query runs on DuckDB:

image

cseefurth commented 1 week ago

Same here. SELECT on a parquet file. Linux Mint, octosql v 0.12.2

panic: runtime error: index out of range [893006642] with length 66306

goroutine 1 [running]:
github.com/segmentio/parquet-go.(*byteArrayDictionary).Index(0x7a6e715875b8?, 0x2a9c3a8?)
    /home/runner/go/pkg/mod/github.com/cube2222/parquet-go@v0.0.0-20220512155810-0e06eee50261/dictionary.go:86 +0xed
github.com/segmentio/parquet-go.(*indexedPageReader).ReadValues(0xc0008b64a0, {0xc0008d3f00, 0xaa, 0x40d4e5?})
    /home/runner/go/pkg/mod/github.com/cube2222/parquet-go@v0.0.0-20220512155810-0e06eee50261/dictionary.go:338 +0x89
github.com/segmentio/parquet-go.(*optionalPageReader).ReadValues(0xc00034c3a0, {0xc0008d3f00, 0xaa, 0xaa})
    /home/runner/go/pkg/mod/github.com/cube2222/parquet-go@v0.0.0-20220512155810-0e06eee50261/page.go:382 +0x14a
github.com/segmentio/parquet-go.(*columnChunkReader).readValuesFromCurrentPage(0xc0008de600)
    /home/runner/go/pkg/mod/github.com/cube2222/parquet-go@v0.0.0-20220512155810-0e06eee50261/column_chunk.go:135 +0x90
github.com/segmentio/parquet-go.(*columnChunkReader).readValues(0xe3f9c0?)
    /home/runner/go/pkg/mod/github.com/cube2222/parquet-go@v0.0.0-20220512155810-0e06eee50261/column_chunk.go:115 +0x29
github.com/segmentio/parquet-go.columnReadRowFuncOfLeaf.func1({0xc00bb84d80?, 0x10, 0x10}, 0x60?, {0xc0008de000, 0x0?, 0x0?})
    /home/runner/go/pkg/mod/github.com/cube2222/parquet-go@v0.0.0-20220512155810-0e06eee50261/column_chunk.go:326 +0xc5
github.com/segmentio/parquet-go.makeColumnReadRowFunc.func1({0x0?, 0x0?, 0x0?}, 0x0?, {0xc0008de000, 0x1a, 0x1a})
    /home/runner/go/pkg/mod/github.com/cube2222/parquet-go@v0.0.0-20220512155810-0e06eee50261/schema.go:163 +0xa3
github.com/segmentio/parquet-go.(*rowGroupRowReader).ReadRow(0x0?, {0x0?, 0x0, 0x0?})
    /home/runner/go/pkg/mod/github.com/cube2222/parquet-go@v0.0.0-20220512155810-0e06eee50261/row_group.go:306 +0xb7
github.com/segmentio/parquet-go.(*reader).ReadRow(0xc000462490, {0x0?, 0x0, 0x1a?})
    /home/runner/go/pkg/mod/github.com/cube2222/parquet-go@v0.0.0-20220512155810-0e06eee50261/reader.go:276 +0xb1
github.com/segmentio/parquet-go.(*Reader).ReadRow(0xc000462480, {0x0, 0x0, 0x0})
    /home/runner/go/pkg/mod/github.com/cube2222/parquet-go@v0.0.0-20220512155810-0e06eee50261/reader.go:221 +0x65
github.com/cube2222/octosql/datasources/parquet.(*DatasourceExecuting).Run(0xc000897c80, {{0xfc1e10?, 0xc0002a9680?}, 0x0?}, 0xc0008755c0, 0xc0002a6800?)
    /home/runner/work/octosql/octosql/datasources/parquet/execution.go:47 +0x512
github.com/cube2222/octosql/outputs/eager.(*OutputPrinter).Run(0xc00082d300, {{0xfc1e10?, 0xc0002a9680?}, 0x0?})
    /home/runner/work/octosql/octosql/outputs/eager/eager.go:39 +0x1e4
github.com/cube2222/octosql/cmd.glob..func4(0x15f71c0, {0xc0002a94d0, 0x1, 0x3?})
    /home/runner/work/octosql/octosql/cmd/root.go:458 +0x3b54
github.com/spf13/cobra.(*Command).execute(0x15f71c0, {0xc0000300d0, 0x3, 0x3})
    /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:856 +0x67c
github.com/spf13/cobra.(*Command).ExecuteC(0x15f71c0)
    /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:974 +0x3b4
github.com/spf13/cobra.(*Command).Execute(...)
    /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:902
github.com/spf13/cobra.(*Command).ExecuteContext(...)
    /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:895
github.com/cube2222/octosql/cmd.Execute({0xfc1d68?, 0xc0001ef000?})
    /home/runner/work/octosql/octosql/cmd/root.go:471 +0x53
main.main()
    /home/runner/work/octosql/octosql/main.go:24 +0xe8