cube2222 / octosql

OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.
Mozilla Public License 2.0
4.75k stars 202 forks source link

Panic #278

Closed kpym closed 2 years ago

kpym commented 2 years ago

I was trying to compare the performances of octosql to xsv and csvq by running the following code (on Windows 10):

octosql "select Country,City from worldcitiespop_mil.csv order by Country"

using a 1,000,000 records file worldcitiespop_mil.csv. But after 9 seconds octosql crashed with the following message:

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x18 pc=0x1259683]

goroutine 1 [running]:
github.com/cube2222/octosql/outputs/batch.(*OutputPrinter).Run.func1({{0x0?, 0x0?}}, {{0xc0003fe3c0, 0x2, 0x2}, 0x0, {0x0, 0x0, 0x0}})
        /home/runner/work/octosql/octosql/outputs/batch/live_output.go:86 +0x203
github.com/cube2222/octosql/execution/nodes.(*Limit).Run.func1({{0x176da08?, 0xc0001a6d00?}}, {{0xc0003fe3c0, 0x2, 0x2}, 0x0, {0x0, 0x0, 0x0
}})
        /home/runner/work/octosql/octosql/execution/nodes/limit.go:35 +0xd7
github.com/cube2222/octosql/execution/nodes.produceOrderByItems.func1({0x1760f40?, 0xc0002d39f0?})
        /home/runner/work/octosql/octosql/execution/nodes/order_by.go:119 +0x124
github.com/google/btree.(*node).iterate(0xc0002b23c0, 0x1, {0x0, 0x0}, {0x0, 0x0}, 0x0, 0x0, 0xc00290c620)
        /home/runner/go/pkg/mod/github.com/google/btree@v1.0.0/btree.go:524 +0x322
github.com/google/btree.(*node).iterate(0xc0002b2480, 0x1, {0x0, 0x0}, {0x0, 0x0}, 0x0, 0x0, 0xc00290c620)
        /home/runner/go/pkg/mod/github.com/google/btree@v1.0.0/btree.go:512 +0x1a5
github.com/google/btree.(*node).iterate(0xc002a1b3c0, 0x1, {0x0, 0x0}, {0x0, 0x0}, 0x0, 0x0, 0xc00290c620)
        /home/runner/go/pkg/mod/github.com/google/btree@v1.0.0/btree.go:512 +0x1a5
github.com/google/btree.(*BTree).Ascend(...)
        /home/runner/go/pkg/mod/github.com/google/btree@v1.0.0/btree.go:777
github.com/cube2222/octosql/execution/nodes.produceOrderByItems({{0x176da08?, 0xc0001a6d00?}}, 0xc0001a6d00?, 0x0?)
        /home/runner/work/octosql/octosql/execution/nodes/order_by.go:113 +0x8d
github.com/cube2222/octosql/execution/nodes.(*OrderBy).Run(0xc0002b2380, {{0x176da08?, 0xc0001a6d00?}, 0x0?}, 0x1dfabf0?, 0x0?)
        /home/runner/work/octosql/octosql/execution/nodes/order_by.go:105 +0x1dd
github.com/cube2222/octosql/execution/nodes.(*Limit).Run(0xc0002966a0, {{0x176da08?, 0xc0001a6d00?}, 0x0?}, 0xc0002fbb00, 0x0?)
        /home/runner/work/octosql/octosql/execution/nodes/limit.go:34 +0x3a6
github.com/cube2222/octosql/outputs/batch.(*OutputPrinter).Run(0xc0002e0300, {{0x176da08?, 0xc0001a6d00?}, 0x0?})
        /home/runner/work/octosql/octosql/outputs/batch/live_output.go:81 +0x396
github.com/cube2222/octosql/cmd.glob..func4(0x1d7a760, {0xc00028c0b0, 0x1, 0x1?})
        /home/runner/work/octosql/octosql/cmd/root.go:463 +0x3653
github.com/spf13/cobra.(*Command).execute(0x1d7a760, {0xc00009e3b0, 0x1, 0x1})
        /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:856 +0x67c
github.com/spf13/cobra.(*Command).ExecuteC(0x1d7a760)
        /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:974 +0x3b4
github.com/spf13/cobra.(*Command).Execute(...)
        /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:902
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:895
github.com/cube2222/octosql/cmd.Execute({0x176da08?, 0xc0001a6d00?})
        /home/runner/work/octosql/octosql/cmd/root.go:476 +0x53
main.main()
        /home/runner/work/octosql/octosql/main.go:24 +0xe8
cube2222 commented 2 years ago

Hey @kpym! Thank you for the report! I will take a look at it ASAP, but in the meantime you can use --output json to sidestep this issue:

octosql "select Country,City from worldcitiespop_mil.csv order by Country" --output json
cube2222 commented 2 years ago

This should be fixed now in the newest release: v0.7.2 That said, if performance is your goal, --output json should still be faster.

Thanks again for the easily reproducible bug report!

kpym commented 2 years ago

@cube2222 Thanks for considering and fixing this bug so quickly 🙏

About the benchmark : The result with --output csv is slower (≈ 10s) than xsv (≈ 2s), csvq (≈ 4s), mlr (≈ 4s), gocsv (≈ 8s), trdsql (≈ 8s) and textql (≈ 8s).