Closed chapmanjacobd closed 1 year ago
Hey, it's actually the line size that's the problem (it's limited to 1MB right now), but I'm happy to add a config option for this.
This has now been added in https://github.com/cube2222/octosql/commit/6644557f8cce2e7231b201581c98a9519d2ae132 and released in 0.11.1.
You are now able to configure the maximum line size in your ~/.octosql/octosql.yml file:
databases:
# ...
files:
json:
max_line_size_bytes: 33554432
Thanks for the report!
edit: added PR (2 loc) here https://github.com/cube2222/octosql/pull/336
Hi @cube2222, this doesn't actually work (I don't think the context is passed properly).
I've added some printing in cmd/root.go
:
fmt.Printf("%+v\n", cfg)
ctx = config.ContextWithConfig(ctx, cfg)
fmt.Printf("%+v\n", ctx)
And in datasources/json/execution.go
:
func (d *DatasourceExecuting) Run(ctx ExecutionContext, produce ProduceFn, metaSend MetaSendFn) error {
fmt.Printf("from json.Run, ctx: %+v\n", ctx)
f, err := files.OpenLocalFile(ctx, d.path, files.WithTail(d.tail))
if err != nil {
return fmt.Errorf("couldn't open local file: %w", err)
}
defer f.Close()
sc := bufio.NewScanner(f)
sc.Buffer(nil, config.FromContext(ctx).Files.JSON.MaxLineSizeBytes)
fmt.Printf("from json.Run, config from context: %+v\n", config.FromContext(ctx))
And it doesnt seem like it's doing anything:
$ ./octosql/main "select * from nat_rules.json" --describe
from root.go, config: &{Databases:[] Files:{JSON:{MaxLineSizeBytes:33554432} BufferSizeBytes:4194304}}
from root.go, context: context.Background.WithCancel.WithValue(config.contextKey, *config.Config)
Usage:
octosql <query> [flags]
octosql [command]
Examples:
octosql "SELECT * FROM myfile.json"
octosql "SELECT * FROM mydir/myfile.csv"
octosql "SELECT * FROM plugins.plugins"
Available Commands:
completion Generate the autocompletion script for the specified shell
help Help about any command
plugin
Flags:
--describe Describe query output schema.
--explain int Describe query output schema.
-h, --help help for octosql
--optimize Whether OctoSQL should optimize the query. (default true)
-o, --output string Output format to use. Available options are live_table, batch_table, csv, json and stream_native. (default "live_table")
--profile string Enable profiling of the given type: cpu, memory, trace.
-v, --version version for octosql
Use "octosql [command] --help" for more information about a command.
Error: typecheck error: couldn't create datasource: couldn't scan lines: bufio.Scanner: token too long
I'll take a more in depth look later and open a PR
edit: added PR (2 loc) here https://github.com/cube2222/octosql/pull/336
For what it's worth I think this was working at one point--or maybe I just filtered out the long line, I don't really remember. But here is my octosql config:
$ cat ~/.octosql/octosql.yml
files:
buffer_size_bytes: 33554432
json:
max_line_size_bytes: 33554432
I'm running out of vespene gas or somteh
sad :'(
the great octopus god is able to work with this other, smaller, file in 110.6s:
It does not use much RAM with either file so not sure what's up :? Both are similar-ish file-ish size-ish 7.8G vs 10GB compressed. maybe 200GB uncompressed