Open Praying opened 4 years ago
For the base serialization crate, there was this benchmark a long time ago: https://dwrensha.github.io/capnproto-rust/2014/01/15/benchmark-update.html . The benchmark code is still checked in, in the benchmark/
directory.
For the capnp-rpc
crate, I am unaware of any benchmarks.
I added capnproto to llogiq's serdebench some while ago.
However, results on my machine weren't so hot compared with flatbuffers, so maybe I did something wrong?
@dureuill Thanks. I opened https://github.com/llogiq/serdebench/pull/4.
I suspect one of the reasons that flatbuffers deserialization appears to be faster is that capnproto-rust is doing utf-8 validation but (as far as I know) flatbuffers is not. I wonder if we should adjust the API of capnp::text::Reader
so that consumers don't need to perform that validation if they don't want to. (This isn't the first time that the validation has noticeably shown up in a benchmark.)
On closer examination, the utf-8 validation does not look as significant as I had initially thought. I now suspect it's mainly the pointer indirection that's expensive for capnproto-rust, and reading a List(Text)
involves a lot of pointer indirection.
Hi, I made a simple ping-pong test tool for Cap'n, the ONLY metric it will tell you is QPS, both server and client are single-threaded, one can specify the number of connections when starting the client, here is the benchmark result on my machine:
# of connections | QPS |
---|---|
1 | 19704 |
2 | 27805 |
3 | 27409 |
4 | 29099 |
5 | 29515 |
6 | 30166 |
7 | 30388 |
8 | 30815 |
9 | 30803 |
10 | 30731 |
20 | 32037 |
30 | 32242 |
40 | 32132 |
50 | 33079 |
60 | 32294 |
70 | 32213 |
Honestly, I am thinking there is something wrong in my code cause the QPS is relatively low, I would like to hear some thoughts on this:)
@SteveLauC thanks for sharing!
The capnp-rpc crate has received a lot less optimization attention attention than the base capnp crate. In particular, I expect that it's doing significantly more memory allocation than it strictly needs to.
When you say "the QPS is relatively low", what are you comparing to? What kind of numbers would be more in line with your expectations?
Hi
The capnp-rpc crate has received a lot less optimization attention attention than the base capnp crate. In particular, I expect that it's doing significantly more memory allocation than it strictly needs to.
Thanks for letting me know!
When you say "the QPS is relatively low", what are you comparing to?
Maybe a ping-pong server speaking HTTP? I have built one with actix-web, with 1 core (1 worker thread), it can perform 100k queries per second on my laptop.
What kind of numbers would be more in line with your expectations?
Maybe at least 50k with one core?
https://github.com/capnproto/capnproto-rust/commit/696cdb969cdec275f668fa60c6a35b6cf166f731 should make things a little bit faster. What would help even more would be to allow the end user to specify a size hint when initializing a request, so that in your example you could do something like
let mut request = client.ping_request(20); // expect message to be at most 20 words long
Using buffered I/O objects seems to help a lot:
diff --git a/src/bin/client.rs b/src/bin/client.rs
index 0c29261..191c1e4 100644
--- a/src/bin/client.rs
+++ b/src/bin/client.rs
@@ -65,6 +65,8 @@ fn main() {
// let stream_poll = stream.try_into_poll_io().unwrap();
let (reader, writer) =
tokio_util::compat::TokioAsyncReadCompatExt::compat(stream).split();
+ let reader = futures::io::BufReader::new(reader);
+ let writer = futures::io::BufWriter::new(writer);
let rpc_network = Box::new(capnp_rpc::twoparty::VatNetwork::new(
reader,
writer,
diff --git a/src/bin/server.rs b/src/bin/server.rs
index 5444dfa..e450cbc 100644
--- a/src/bin/server.rs
+++ b/src/bin/server.rs
@@ -63,6 +63,8 @@ fn main() {
let (reader, writer) =
tokio_util::compat::TokioAsyncReadCompatExt::compat(stream).split();
+ let reader = futures::io::BufReader::new(reader);
+ let writer = futures::io::BufWriter::new(writer);
let network = capnp_rpc::twoparty::VatNetwork::new(
reader,
writer,
This gives me a roughly 35% higher QPS.
We should probably update the capnp-rpc examples to include this kind of wrapping.
Using buffered I/O objects seems to help a lot:
Yeah, I have updated my benchmark results, and it now can reach 50k QPS with 6 connections, thanks a lot!
such as throughput compared with gRPC?