chdb-io / chdb-server-bak

API Server for chDB, an in-process SQL OLAP Engine powered by ClickHouse
https://chdb.io
Apache License 2.0
21 stars 4 forks source link

Weird behaviour on requests with newlines #9

Open akvlad opened 10 months ago

akvlad commented 10 months ago
$ curl -vv -X POST http://a:b@localhost:8123 --data 'CREATE DATABASE test';
$ cat <<EOF | curl -vv -X POST http://a:b@localhost:8123 --data-binary @-
CREATE TABLE IF NOT EXISTS test.settings (fingerprint UInt64, type String, name String, value String, inserted_at DateTime64(9, 'UTC')) ENGINE = ReplacingMergeTree(inserted_at) ORDER BY fingerprint

EOF

$ cat <<EOF | curl -vv -X POST http://a:b@localhost:8123/?database=test --data-binary @-
INSERT INTO settings (fingerprint, type, name, value, inserted_at) VALUES (cityHash64('update_v3_5'), 'update',
     'v3_1', toString(toUnixTimestamp(NOW())), NOW())
EOF

< HTTP/1.1 400 BAD REQUEST
< Server: Werkzeug/3.0.1 Python/3.8.10
< Date: Fri, 03 Nov 2023 15:40:44 GMT
< Content-Type: text/html; charset=utf-8
< Content-Length: 226
< Connection: close
< 
Code: 62. DB::Exception: Code: 62. DB::Exception: Cannot parse expression of type String here: : While executing ValuesBlockInputFormat: data for INSERT was parsed from query. (SYNTAX_ERROR) (version 23.6.1.1). (SYNTAX_ERROR)
* Closing connection 0

The third request gets error 400 with some misleading message. On the other hand:

$ cat <<EOF | curl -vv -X POST http://a:b@localhost:8123/?database=test --data-binary @-
INSERT INTO settings (fingerprint, type, name, value, inserted_at) VALUES (cityHash64('update_v3_5'), 'update', 'v3_1', toString(toUnixTimestamp(NOW())), NOW())

EOF
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 127.0.0.1:8123...
* Connected to localhost (127.0.0.1) port 8123 (#0)
* Server auth using Basic with user 'a'
> POST /?database=test HTTP/1.1
> Host: localhost:8123
> Authorization: Basic YTpi
> User-Agent: curl/7.88.1
> Accept: */*
> Content-Length: 162
> Content-Type: application/x-www-form-urlencoded
> 
< HTTP/1.1 200 OK
< Server: Werkzeug/3.0.1 Python/3.8.10
< Date: Fri, 03 Nov 2023 15:45:31 GMT
< Content-Type: text/html; charset=utf-8
< Content-Length: 0
< Connection: close
< 
* Closing connection 0

When the same request is sent with no newline inside, the data gets ingested successfully. Newlines in the middle of a request should be omitted by server.

lmangani commented 10 months ago

The current support is barely plaintext only. To properly support POST data (including binary formats) we should expose a data hook in chdb equivalent to stdin (or --data) in clickhouse-local

# python3 -m chdb "SELECT 1", "Native" > output.native
# python3 -m chdb "SELECT * FROM table" < output.native
1 

@auxten is there a way we can expose/hook to pass data into chdb without using rough stdin?

lmangani commented 10 months ago

The data pipe won't be available until future versions of chdb will support it without stdin hacks. I've implemented a workaround in 0.15.3 which should work around the newline issues until better things come along.

lmangani commented 9 months ago

The workaround was weak and didn't cover binary protocols. so as predicted this needs an stdin hack to function. Here's a prototype to test: https://github.com/chdb-io/chdb-server/pull/11