Protryon / klickhouse

Rust crate for accessing Clickhouse
Apache License 2.0
92 stars 21 forks source link

Type hinting in serialization depends on field ordering #34

Closed cpg314 closed 9 months ago

cpg314 commented 10 months ago

When struct fields are not declared in the same order as in the database, type hints will be passed to the wrong fields.

This affects only structs with fields requiring a type hint (decimal, bytes...) and results in an error from the server.

Detailed description

The Row::serialize_row method signature is

    fn serialize_row(
        self,
        type_hints: &[&Type]
    ) -> Result<Vec<(Cow<'static, str>, Value)>>;

It is called in client.rs as follows:

let types = first_block.column_types.values().collect::<Vec<_>>();
rows.into_iter()
    .map(|x| x.serialize_row(&types[..]))

The Row derive macro implements serialize_row by iterating over fields, and calling types[idx] to get the type hint for each field

There is however no guarantee that the fields order will match between the ones in the struct declaration and in first_block (returned by the server).

Example

#[derive(klickhouse::Row)]
struct TestRow {
    a: klickhouse::Bytes,
    b: klickhouse::Bytes,
}

#[tokio::test]
async fn blah() {
    let client = get_client().await;

    prepare_table(
        "test",
        "
         a Array(UInt8),
         b String",
        &client,
    )
    .await;

    client
        .insert_native_block(
            "INSERT INTO test FORMAT Native",
            vec![TestRow {
                a: vec![].into(),
                b: vec![].into(),
            }],
        )
        .await
        .unwrap();
}

This test passes, but fails if a and b are switched in the struct declaration:

struct TestRow {
    b: klickhouse::Bytes,
    a: klickhouse::Bytes,
}

Indeed, b now gets the type hint meant for a.

Solution

The solution is pretty simple but requires a change to the Row trait:

In client.rs, column_types is originally an IndexMap in a Block. Rather than passing it as a slice, one should pass the IndexMap, allowing to safely retrieve field type hints by name.

This would change the signature of serialize_row to

fn serialize_row(
        self,
        type_hints: &IndexMap<String, Type>,
    ) -> Result<Vec<(Cow<'static, str>, Value)>>;
Protryon commented 10 months ago

Yeah, I'm on board for this, just requires a minor version bump due to breaking change. I've been aware of this limitation for a while.

cpg314 commented 10 months ago

Ok, I'm happy to send a PR if you wish. Thanks a lot for your review of the others :)

Protryon commented 10 months ago

Yep @cpg314, your contributions are much appreciated. :) I added you as a contributor so you can stack your PRs.

Protryon commented 9 months ago

Fixed by #37