ClickHouse / clickhouse-rs

Official pure Rust typed client for ClickHouse DB
https://clickhouse.com
Apache License 2.0
282 stars 83 forks source link

Implement `Stream` for `RowCursor` #122

Open v3xro opened 1 month ago

v3xro commented 1 month ago

This would make usage within other async code more ergonomic.

loyd commented 1 month ago

Makes sense, possibly under dedicated futures03 feature

v3xro commented 1 month ago

I've had an initial attempt at implementing stream using https://github.com/tokio-rs/async-stream. However, during testing against a non-toy database I found cases where the deserialized strings are invalid utf8 which points to some buffer corruption somewhere. Not entirely sure where to start debugging that as it does not happen when you use the cursor directly manually. this is the current branch in case that's helpful: https://github.com/v3xro/clickhouse.rs/tree/feat/impl-stream

Edit 1: I found https://github.com/tokio-rs/async-stream/issues/106 so will re-check, seems quite likely Edit 2: That not the issue but #24 looks suspicious

v3xro commented 1 month ago

@loyd do you have an idea how to fix #24? The sqlx type signature linked in the ticket did not inspire me to a solution :grin:

Looking at the code, it looks like you would need to move the lifetime of the buffer into the cursor type so you can properly have the serde 'de lifetime be a subordinate of that? Ok, as I was writing that I realised that's what you meant by GAT being stable (which is the case now unlike 3 years ago).

Edit: did more research, found #46 and now wondering whether it would even be possible to provide a generic Stream API without implementing the copy variant of serde deserializers

loyd commented 1 month ago

24 affects your case only if you want to use borrowed rows in Stream impl. They are fundamentally incompatible (anyone can call StreamExt::buffered), only T: for<'b> Deserialize<'b> can be used in that case (see Query::fetch_all).

So, you should add such restrictions on https://github.com/v3xro/clickhouse.rs/commit/d25b1bba8ea91c67f17f3eba7af7df9b52935b4c#diff-f0c63cae3ed0328ebbe14ac22d86b43bc8a40146944b03c8326681e4f3802443R200 as where T: for<'b> Deserialize<'b> and note in docstrings that only owned T is supported

loyd commented 1 month ago

And I'm sure we should avoid async-stream. It is a good implementation, but has disadvantages:

It seems it can be easily replaced with unfold here.

loyd commented 1 month ago

do you have an idea how to fix https://github.com/ClickHouse/clickhouse.rs/issues/24?

I do, but not ideal ones (with breaking changes in API sadly). Let's continue in #24 if you want to help with the issue or have ideas