Open buf1024 opened 1 year ago
AFAIK you can't write FixedStrings like String - CH wire format doesn't expect a size header like how Strings are serialized. You'll have to manually write each byte in the FixedString using repeated calls to serialize_u8
.
See https://github.com/loyd/clickhouse.rs/blob/d5926b12007cd20348ce945b074442395e5fd1d2/src/rowbinary/ser.rs#L69 - you can't have the serializer write out the length.
AFAIK you can't write FixedStrings like String - CH wire format doesn't expect a size header like how Strings are serialized. You'll have to manually write each byte in the FixedString using repeated calls to
serialize_u8
.See
- you can't have the serializer write out the length.
that means this crate do not support the CH base type FixedString ?
Well, it does, but it's similar to the discussion for https://github.com/loyd/clickhouse.rs/issues/48 - you'll need to use serde_with
to ensure the serializer does the appropriate thing.
All right, String
cannot be written to FixedString(_)
without serde(with)
. I think about clickhouse::serde::fixed_string
for it, https://github.com/serde-rs/serde/issues/1059#issuecomment-1211005392
Now I'm adding clickhouse::serde
helpers (time
, ipv4
, uuid
are already added), so it seems reasonable to support the next code
#[derive(Row, Serialize)]
struct MyRow {
#[serialize_with = "clickhouse::serde::fixed_string::<_, 8>"]
s: String, // actually any ref to &str
}
Any updates about FixedString
support, or does anyone have alter methods like the UInt256
's serde mod?
So I've implemented the following,
use clickhouse::Row;
use serde::Serialize;
pub mod fixed_string {
use serde::{ser::Serializer, Serialize};
pub fn serialize<S: Serializer, const LEN: usize>(
str: &str,
ser: S,
) -> Result<S::Ok, S::Error> {
let str_bytes = str.as_bytes();
if str_bytes.len() >= LEN {
log::debug!(
"[Ser] length are more than equal to {}, serialized to: `{:?}`",
LEN,
&str_bytes[..LEN]
);
str_bytes[..LEN].serialize(ser)
} else {
let mut padding: [u8; LEN] = [0; LEN];
padding[..str_bytes.len()].copy_from_slice(str_bytes);
log::debug!(
"[Ser] length are less than equal to {}, serialized to: `{:?}`",
LEN,
padding
);
padding.serialize(ser)
}
}
}
#[derive(Debug, Row, Serialize)]
pub struct Test {
#[serde(serialize_with = "fixed_string::serialize::<_, 8>")]
pub id: String,
}
Then do a test on Clickhouse 23.11, it still results in the following error,
[2024-04-04T06:41:39Z ERROR server_rs::queue] [Clickhouse] error on committing message: bad response: Code: 33. DB::Exception: Cannot read all data. Bytes read: 1. Bytes expected: 8.: (at row 2)
: While executing BinaryRowInputFormat. (CANNOT_READ_ALL_DATA) (version 23.11.5.29 (official build))
Is there an extra byte sent by the library? Since the error comes on the first row, but it indicated there's a second row.
Also here are the sql file I use to make the table,
create or replace table events_dev.test ( id FixedString(8) ) engine = MergeTree() primary key (id)
Anyone can maybe steer me to the right direction here?
Also I've busted out tcpdump
on the http port of Clickhouse to see the request that was sent to Clickhouse, and it seems like the library does add the length to the start of the already serialized string (? CMIIW).
Is there any way to basically make the library ignore adding the length to the serialized string? (@loyd) Or maybe it's part of serde implementation?
p.s. If you're curios about the packet, here's the dump
0000 86 dd 00 00 00 00 00 01 03 04 00 06 00 00 00 00 ................
0010 00 00 00 00 60 03 35 90 01 27 06 40 00 00 00 00 ....`.5..'.@....
0020 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 ................
0030 00 00 00 00 00 00 00 00 00 00 00 01 d8 64 1f bb .............d..
0040 34 1b 5d b2 40 f4 b6 f6 80 18 01 04 01 2f 00 00 4.].@......../..
0050 01 01 08 0a d8 57 ce 61 d8 57 ce 61 50 4f 53 54 .....W.a.W.aPOST
0060 20 2f 3f 64 61 74 61 62 61 73 65 3d 65 76 65 6e /?database=even
0070 74 73 5f 64 65 76 26 71 75 65 72 79 3d 49 4e 53 ts_dev&query=INS
0080 45 52 54 2b 49 4e 54 4f 2b 74 65 73 74 25 32 38 ERT+INTO+test%28
0090 25 36 30 69 64 25 36 30 25 32 39 2b 46 4f 52 4d %60id%60%29+FORM
00a0 41 54 2b 52 6f 77 42 69 6e 61 72 79 26 64 65 63 AT+RowBinary&dec
00b0 6f 6d 70 72 65 73 73 3d 31 20 48 54 54 50 2f 31 ompress=1 HTTP/1
00c0 2e 31 0d 0a 78 2d 63 6c 69 63 6b 68 6f 75 73 65 .1..x-clickhouse
00d0 2d 75 73 65 72 3a 20 63 6c 69 63 6b 68 6f 75 73 -user: clickhous
00e0 65 0d 0a 78 2d 63 6c 69 63 6b 68 6f 75 73 65 2d e..x-clickhouse-
00f0 6b 65 79 3a 20 63 6c 69 63 6b 68 6f 75 73 65 0d key: clickhouse.
0100 0a 68 6f 73 74 3a 20 6c 6f 63 61 6c 68 6f 73 74 .host: localhost
0110 3a 38 31 32 33 0d 0a 74 72 61 6e 73 66 65 72 2d :8123..transfer-
0120 65 6e 63 6f 64 69 6e 67 3a 20 63 68 75 6e 6b 65 encoding: chunke
0130 64 0d 0a 0d 0a 32 33 0d 0a 1c fb b3 90 92 6a d7 d....23.......j.
0140 60 24 98 bd 5f e7 02 b3 b0 82 13 00 00 00 09 00 `$.._...........
0150 00 00 90 08 30 31 48 54 4b 58 45 58 0d 0a 30 0d ....01HTKXEX..0.
0160 0a 0d 0a ...
@fauh45 sorry for late response.
You should use https://docs.rs/serde/latest/serde/trait.Serializer.html#tymethod.serialize_tuple or multiple serialize_u8()
as suggested above by @shenghaoyang in order to serialize bytes/strings without length.
Somehow I am able to use it. I am decoding it to a [u8; 4]
when using FixedString(4)
It seems more than 1 FixedString not work. demo code:
the panic information:
the call stack point to
insert.end().await.unwrap();
rust version:
clickhouse version: