datafuselabs / databend

𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
https://docs.databend.com
Other
7.67k stars 727 forks source link

NULL value are not correctly distinguished via the HTTP API #16076

Open wyhaya opened 1 month ago

wyhaya commented 1 month ago

Summary

create table users (name string);
insert into users (name) values (null), ('NULL');
curl -u root: -X POST
     -H "Content-Type: application/json" 
     -d '{"sql":"select * from users;"}' 
     http://localhost:8000/v1/query
Output
"schema": [
    {"name": "name",  "type": "Nullable(String)"}
],
"data": [
    ["NULL"],
    ["NULL"]
],

There is no way to tell which of these two values is NULL and which is "NULL".


If we use the Rust driver, we get two NULL, which is wrong.

[driver/examples/test.rs:10:5] query = [
    Row(
        [
            Null,
        ],
    ),
    Row(
        [
            Null,
        ],
    ),
]

The expected result is:

[driver/examples/test.rs:10:5] query = [
    Row(
        [
            Null,
        ],
    ),
    Row(
        [
            String(
                "NULL",
            ),
        ],
    ),
]
wyhaya commented 1 month ago

There's a similar problem in ClickHouse, but it returns a weird ᴺᵁᴸᴸ as long as your string isn't "ᴺᵁᴸᴸ" then it is properly distinguished in most cases.

"data":[
    ["ᴺᵁᴸᴸ"]
],

ClickHouse Query

youngsofun commented 1 month ago

we will fix it by representing null just as json null. inside, StringBlock should be [[Option<String>]] instead of [[String]]

wyhaya commented 1 month ago
   pub session: Option<SessionState>,
   pub schema: Vec<SchemaField>,
-  pub data: Vec<Vec<String>>,
+  pub data: Vec<Vec<Option<String>>>,

Previously all values ​​were String, but now the value may be String | Null. How to ensure client compatibility?