Open sundy-li opened 2 weeks ago
use (type = TSV compression=gzip)
like in copy
curl -XPUT 'http://root:@127.0.0.1:8000/v1/streaming_load' -H 'insert_sql: insert into hackernews_1m FILE_FORMAT = (type = TSV compression=gzip) ' -F 'upload=@"./hacknernews_1m.csv.gz"'
{"id":"ed17ccdc-2ea2-4fac-9ff9-97b0f61fd487","state":"SUCCESS","stats":{"rows":1000000,"bytes":130618406},"error":null,"files":["hacknernews_1m.csv.gz"]}%
There are two issues.
stats":{"rows":2000000,"bytes":642061025}
let compression = input_context
.get_compression_alg(&filename)
.map_err(BadRequest)?;
compression is wrong.
default compression is none not auto
stats is right on my mac.
the bytes
you get is diff too, may be you have a diff hacknernews_1m.csv?
(venv) ➜ test git:(stage2) ls -lh hacknernews_1m.csv*
-rw-r--r-- 1 yangxiufeng staff 342M Sep 3 2023 hacknernews_1m.csv
-rw-r--r-- 1 yangxiufeng staff 125M Apr 30 16:42 hacknernews_1m.csv.gz
(venv) ➜ test git:(stage2) curl -XPUT 'http://root:@127.0.0.1:8000/v1/streaming_load' -H 'insert_sql: insert into hackernews_1m FILE_FORMAT = (type = TSV compression=gzip) ' -F 'upload=@"./hacknernews_1m.csv.gz"'
{"id":"c0766b8a-77c5-477c-9a48-01e09a895ad1","state":"SUCCESS","stats":{"rows":1000000,"bytes":130618406},"error":null,"files":["hacknernews_1m.csv.gz"]}%
(venv) ➜ test git:(stage2) curl -XPUT 'http://root:@127.0.0.1:8000/v1/streaming_load' -H 'insert_sql: insert into hackernews_1m FILE_FORMAT = (type = TSV compression=none) ' -F 'upload=@"./hacknernews_1m.csv"'
{"id":"08b3d058-13ab-4303-8925-e40ac971a2dd","state":"SUCCESS","stats":{"rows":1000000,"bytes":358551613},"error":null,"files":["hacknernews_1m.csv"]}% (venv) ➜ test git:(stage2) curl -XPUT 'http://root:@127.0.0.1:8000/v1/streaming_load' -H 'insert_sql: insert into hackernews_1m FILE_FORMAT = (type = TSV compression=none) ' -F 'upload=@"./hacknernews_1m.csv"'
{"id":"6b5a826b-ec49-47e0-a2ba-75bf372dd36f","state":"SUCCESS","stats":{"rows":1000000,"bytes":358551613},"error":null,"files":["hacknernews_1m.csv"]}%
❯ curl -XPUT 'http://root:@127.0.0.1:8000/v1/streaming_load' -H 'insert_sql: insert into hackernews_1m FILE_FORMAT = (type = TSV) ' -F 'upload=@"./hacknernews_1m.csv"'
{"id":"546943ab-f100-4a39-aee8-adca6ca27596","state":"SUCCESS","stats":{"rows":2000000,"bytes":642061026},"error":null,"files":["hacknernews_1m.csv"]}%
❯ wc -l ./hacknernews_1m.csv
1000000 ./hacknernews_1m.csv
❯ ls -lh hacknernews_1m.csv*
-rw-r--r-- 1 sundy sundy 342M Sep 3 2023 hacknernews_1m.csv
Summary