capnproto / capnproto-rust

Cap'n Proto for Rust
MIT License
2.06k stars 222 forks source link

Is there any benchmark? #197

Open Praying opened 4 years ago

Praying commented 4 years ago

such as throughput compared with gRPC?

dwrensha commented 4 years ago

For the base serialization crate, there was this benchmark a long time ago: https://dwrensha.github.io/capnproto-rust/2014/01/15/benchmark-update.html . The benchmark code is still checked in, in the benchmark/ directory.

For the capnp-rpc crate, I am unaware of any benchmarks.

dureuill commented 4 years ago

I added capnproto to llogiq's serdebench some while ago.

However, results on my machine weren't so hot compared with flatbuffers, so maybe I did something wrong?

dwrensha commented 4 years ago

@dureuill Thanks. I opened https://github.com/llogiq/serdebench/pull/4.

dwrensha commented 4 years ago

I suspect one of the reasons that flatbuffers deserialization appears to be faster is that capnproto-rust is doing utf-8 validation but (as far as I know) flatbuffers is not. I wonder if we should adjust the API of capnp::text::Reader so that consumers don't need to perform that validation if they don't want to. (This isn't the first time that the validation has noticeably shown up in a benchmark.)

dwrensha commented 4 years ago

On closer examination, the utf-8 validation does not look as significant as I had initially thought. I now suspect it's mainly the pointer indirection that's expensive for capnproto-rust, and reading a List(Text) involves a lot of pointer indirection.

SteveLauC commented 5 months ago

Hi, I made a simple ping-pong test tool for Cap'n, the ONLY metric it will tell you is QPS, both server and client are single-threaded, one can specify the number of connections when starting the client, here is the benchmark result on my machine:

# of connections QPS
1 19704
2 27805
3 27409
4 29099
5 29515
6 30166
7 30388
8 30815
9 30803
10 30731
20 32037
30 32242
40 32132
50 33079
60 32294
70 32213

Honestly, I am thinking there is something wrong in my code cause the QPS is relatively low, I would like to hear some thoughts on this:)

dwrensha commented 5 months ago

@SteveLauC thanks for sharing!

The capnp-rpc crate has received a lot less optimization attention attention than the base capnp crate. In particular, I expect that it's doing significantly more memory allocation than it strictly needs to.

When you say "the QPS is relatively low", what are you comparing to? What kind of numbers would be more in line with your expectations?

SteveLauC commented 5 months ago

Hi

The capnp-rpc crate has received a lot less optimization attention attention than the base capnp crate. In particular, I expect that it's doing significantly more memory allocation than it strictly needs to.

Thanks for letting me know!


When you say "the QPS is relatively low", what are you comparing to?

Maybe a ping-pong server speaking HTTP? I have built one with actix-web, with 1 core (1 worker thread), it can perform 100k queries per second on my laptop.

What kind of numbers would be more in line with your expectations?

Maybe at least 50k with one core?

dwrensha commented 5 months ago

https://github.com/capnproto/capnproto-rust/commit/696cdb969cdec275f668fa60c6a35b6cf166f731 should make things a little bit faster. What would help even more would be to allow the end user to specify a size hint when initializing a request, so that in your example you could do something like

let mut request = client.ping_request(20); // expect message to be at most 20 words long
dwrensha commented 5 months ago

Using buffered I/O objects seems to help a lot:

diff --git a/src/bin/client.rs b/src/bin/client.rs
index 0c29261..191c1e4 100644
--- a/src/bin/client.rs
+++ b/src/bin/client.rs
@@ -65,6 +65,8 @@ fn main() {
                         // let stream_poll = stream.try_into_poll_io().unwrap();
                         let (reader, writer) =
                             tokio_util::compat::TokioAsyncReadCompatExt::compat(stream).split();
+                        let reader = futures::io::BufReader::new(reader);
+                        let writer = futures::io::BufWriter::new(writer);
                         let rpc_network = Box::new(capnp_rpc::twoparty::VatNetwork::new(
                             reader,
                             writer,
diff --git a/src/bin/server.rs b/src/bin/server.rs
index 5444dfa..e450cbc 100644
--- a/src/bin/server.rs
+++ b/src/bin/server.rs
@@ -63,6 +63,8 @@ fn main() {
                         let (reader, writer) =
                             tokio_util::compat::TokioAsyncReadCompatExt::compat(stream).split();

+                        let reader = futures::io::BufReader::new(reader);
+                        let writer = futures::io::BufWriter::new(writer);
                         let network = capnp_rpc::twoparty::VatNetwork::new(
                             reader,
                             writer,

This gives me a roughly 35% higher QPS.

We should probably update the capnp-rpc examples to include this kind of wrapping.

SteveLauC commented 5 months ago

Using buffered I/O objects seems to help a lot:

Yeah, I have updated my benchmark results, and it now can reach 50k QPS with 6 connections, thanks a lot!