ClickHouse / clickhouse-rs

Official pure Rust typed client for ClickHouse DB
https://clickhouse.com
Apache License 2.0
318 stars 88 forks source link

Support the `FixedString` type #49

Open buf1024 opened 1 year ago

buf1024 commented 1 year ago

It seems more than 1 FixedString not work. demo code:

let client = Client::default()
                .with_url("http://localhost:8123")
                .with_database("default");

            client.query(r###"create table if not exists tpayment
(
    code         FixedString(8),
    name         String,
    stock_code   FixedString(8)
)
    ENGINE = ReplacingMergeTree()
        PRIMARY KEY (name);"###).execute().await.unwrap();

            let mut insert = client.insert("tpayment").unwrap();
            for _ in 0..10 {
                insert.write(&TPayment {
                    code: "12345678".to_string(),
                    name: "foo".to_string(),
                    stock_code: "12345678".to_string(),
                }).await.unwrap();
            }
            insert.end().await.unwrap();

            let mut cursor = client
                .query("SELECT ?fields FROM tpayment")
                .fetch::<TPayment>().unwrap();

the panic information:

called `Result::unwrap()` on an `Err` value: BadResponse("Code: 33. DB::Exception: Cannot read all data. Bytes read: 3. Bytes expected: 53.: (at row 4)\n: While executing BinaryRowInputFormat. (CANNOT_READ_ALL_DATA) (version 21.12.3.32 (official build))")
thread 'store::clickhouse::clickhouse::tests::test' panicked at 'called `Result::unwrap()` on an `Err` value: BadResponse("Code: 33. DB::Exception: Cannot read all data. Bytes read: 3. Bytes expected: 53.: (at row 4)\n: While executing BinaryRowInputFormat. (CANNOT_READ_ALL_DATA) (version 21.12.3.32 (official build))")', data/src/store/clickhouse/clickhouse.rs:99:32
stack backtrace:

the call stack point to insert.end().await.unwrap();

rust version:

rustc 1.66.0-nightly (4a1467723 2022-09-23)

clickhouse version:

ClickHouse client version 21.12.3.32 (official build).
shenghaoyang commented 1 year ago

AFAIK you can't write FixedStrings like String - CH wire format doesn't expect a size header like how Strings are serialized. You'll have to manually write each byte in the FixedString using repeated calls to serialize_u8.

See https://github.com/loyd/clickhouse.rs/blob/d5926b12007cd20348ce945b074442395e5fd1d2/src/rowbinary/ser.rs#L69 - you can't have the serializer write out the length.

buf1024 commented 1 year ago

AFAIK you can't write FixedStrings like String - CH wire format doesn't expect a size header like how Strings are serialized. You'll have to manually write each byte in the FixedString using repeated calls to serialize_u8.

See

https://github.com/loyd/clickhouse.rs/blob/d5926b12007cd20348ce945b074442395e5fd1d2/src/rowbinary/ser.rs#L69

  • you can't have the serializer write out the length.

that means this crate do not support the CH base type FixedString ?

shenghaoyang commented 1 year ago

Well, it does, but it's similar to the discussion for https://github.com/loyd/clickhouse.rs/issues/48 - you'll need to use serde_with to ensure the serializer does the appropriate thing.

loyd commented 1 year ago

All right, String cannot be written to FixedString(_) without serde(with). I think about clickhouse::serde::fixed_string for it, https://github.com/serde-rs/serde/issues/1059#issuecomment-1211005392

Now I'm adding clickhouse::serde helpers (time, ipv4, uuid are already added), so it seems reasonable to support the next code

#[derive(Row, Serialize)]
struct MyRow {
    #[serialize_with = "clickhouse::serde::fixed_string::<_, 8>"]
    s: String, // actually any ref to &str
}
c0mm4nd commented 1 year ago

Any updates about FixedString support, or does anyone have alter methods like the UInt256's serde mod?

fauh45 commented 7 months ago

So I've implemented the following,

use clickhouse::Row;
use serde::Serialize;

pub mod fixed_string {
    use serde::{ser::Serializer, Serialize};

    pub fn serialize<S: Serializer, const LEN: usize>(
        str: &str,
        ser: S,
    ) -> Result<S::Ok, S::Error> {
        let str_bytes = str.as_bytes();

        if str_bytes.len() >= LEN {
            log::debug!(
                "[Ser] length are more than equal to {}, serialized to: `{:?}`",
                LEN,
                &str_bytes[..LEN]
            );

            str_bytes[..LEN].serialize(ser)
        } else {
            let mut padding: [u8; LEN] = [0; LEN];
            padding[..str_bytes.len()].copy_from_slice(str_bytes);

            log::debug!(
                "[Ser] length are less than equal to {}, serialized to: `{:?}`",
                LEN,
                padding
            );

            padding.serialize(ser)
        }
    }
}

#[derive(Debug, Row, Serialize)]
pub struct Test {
    #[serde(serialize_with = "fixed_string::serialize::<_, 8>")]
    pub id: String,
}

Then do a test on Clickhouse 23.11, it still results in the following error,

[2024-04-04T06:41:39Z ERROR server_rs::queue] [Clickhouse] error on committing message: bad response: Code: 33. DB::Exception: Cannot read all data. Bytes read: 1. Bytes expected: 8.: (at row 2)
    : While executing BinaryRowInputFormat. (CANNOT_READ_ALL_DATA) (version 23.11.5.29 (official build))

Is there an extra byte sent by the library? Since the error comes on the first row, but it indicated there's a second row.

Also here are the sql file I use to make the table,

create or replace table events_dev.test ( id FixedString(8) ) engine = MergeTree() primary key (id)

Anyone can maybe steer me to the right direction here?

fauh45 commented 7 months ago

Also I've busted out tcpdump on the http port of Clickhouse to see the request that was sent to Clickhouse, and it seems like the library does add the length to the start of the already serialized string (? CMIIW).

image

Is there any way to basically make the library ignore adding the length to the serialized string? (@loyd) Or maybe it's part of serde implementation?

p.s. If you're curios about the packet, here's the dump

0000   86 dd 00 00 00 00 00 01 03 04 00 06 00 00 00 00   ................
0010   00 00 00 00 60 03 35 90 01 27 06 40 00 00 00 00   ....`.5..'.@....
0020   00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00   ................
0030   00 00 00 00 00 00 00 00 00 00 00 01 d8 64 1f bb   .............d..
0040   34 1b 5d b2 40 f4 b6 f6 80 18 01 04 01 2f 00 00   4.].@......../..
0050   01 01 08 0a d8 57 ce 61 d8 57 ce 61 50 4f 53 54   .....W.a.W.aPOST
0060   20 2f 3f 64 61 74 61 62 61 73 65 3d 65 76 65 6e    /?database=even
0070   74 73 5f 64 65 76 26 71 75 65 72 79 3d 49 4e 53   ts_dev&query=INS
0080   45 52 54 2b 49 4e 54 4f 2b 74 65 73 74 25 32 38   ERT+INTO+test%28
0090   25 36 30 69 64 25 36 30 25 32 39 2b 46 4f 52 4d   %60id%60%29+FORM
00a0   41 54 2b 52 6f 77 42 69 6e 61 72 79 26 64 65 63   AT+RowBinary&dec
00b0   6f 6d 70 72 65 73 73 3d 31 20 48 54 54 50 2f 31   ompress=1 HTTP/1
00c0   2e 31 0d 0a 78 2d 63 6c 69 63 6b 68 6f 75 73 65   .1..x-clickhouse
00d0   2d 75 73 65 72 3a 20 63 6c 69 63 6b 68 6f 75 73   -user: clickhous
00e0   65 0d 0a 78 2d 63 6c 69 63 6b 68 6f 75 73 65 2d   e..x-clickhouse-
00f0   6b 65 79 3a 20 63 6c 69 63 6b 68 6f 75 73 65 0d   key: clickhouse.
0100   0a 68 6f 73 74 3a 20 6c 6f 63 61 6c 68 6f 73 74   .host: localhost
0110   3a 38 31 32 33 0d 0a 74 72 61 6e 73 66 65 72 2d   :8123..transfer-
0120   65 6e 63 6f 64 69 6e 67 3a 20 63 68 75 6e 6b 65   encoding: chunke
0130   64 0d 0a 0d 0a 32 33 0d 0a 1c fb b3 90 92 6a d7   d....23.......j.
0140   60 24 98 bd 5f e7 02 b3 b0 82 13 00 00 00 09 00   `$.._...........
0150   00 00 90 08 30 31 48 54 4b 58 45 58 0d 0a 30 0d   ....01HTKXEX..0.
0160   0a 0d 0a                                          ...
loyd commented 3 months ago

@fauh45 sorry for late response.

You should use https://docs.rs/serde/latest/serde/trait.Serializer.html#tymethod.serialize_tuple or multiple serialize_u8() as suggested above by @shenghaoyang in order to serialize bytes/strings without length.

dgrr commented 3 months ago

Somehow I am able to use it. I am decoding it to a [u8; 4] when using FixedString(4)