Protryon / klickhouse

Rust crate for accessing Clickhouse
Apache License 2.0
92 stars 21 forks source link

Composability of the Row trait, with flattening or implementation on tuples #30

Closed cpg314 closed 10 months ago

cpg314 commented 10 months ago

Problem statement

The Row trait allows serializing a Rust struct to and from Clickhouse, e.g.

#[derive(klickhouse::Row)]
struct User {
  age: u8,
  name: String
}
let users: Vec<User> = ch.query_collect("SELECT age, name from users").await?;

In this example, let's assume we now want to retrieve user details and account balance in the same query. We would like to do something like

let users: Vec<(u32, User)> = ch.query_collect("SELECT credits, (age, name) FROM ...").await?;
// or
#[derive(klickhouse::Row)]
struct Row {
  #[klickhouse(flatten)]
  user: User,
  credits: u32
}
let users: Vec<Row> = ch.query_collect("SELECT age, name, credits FROM ...").await?;

i.e. composing with the existing implementation of User: Row:

Neither of the approaches are currently supported:

Rather, one has to manually implement Row (subject to some issues described below).

Implementation possibilities

Row on composed types

For the first approach, one would implement Row on tuples implementing Row.

For (T1, ..., Tn), one would either:

flatten support

For serialization and deserialization, one would:

One minor annoyance is the signature

fn deserialize_row(map: Vec<(&str, &Type, Value)>) -> Result<Self>;

which does not allow to efficiently retrieve the fields to pass to the recursive deserialize_rows calls (i.e. one needs to linearly search for them and swap the Value). This problem would disappear if the signature was

fn deserialize_row(map: IndexMap<String, (Type, Value)>) -> Result<Self>;

which is actually a pretty simple change in the code, but a breaking API change. The inefficiency probably does not matter with a small number of fields.

Subfield addressing

Another approach is to allow deriving

#[derive(klickhouse::Row)]
struct Row {
  user: User,
  credits: u32
}

where the User fields are serialized and deserialized with a user prefix, e.g.

SELECT name AS "user.name", age AS "user.age", credits FROM ...

Either implicitly, or with another attribute on the user: User field.

Comments?

I have a working draft for the flatten approach, and I might have a go at the tuple approach too.

Before cleaning this up and submitting a PR, I wanted to ask whether you had any comments or thoughts about this?

I only recently started using clickhouse and this (very nice) crate, and I might be missing some subtleties.