Open jonathanstrong opened 6 years ago
Thanks so much for the very detailed report!
It looks like the tail latency of crossbeam-channel
is slightly worse than the latency of multiqueue
. I'm curious if you have any commentary on these results? Would the difference in tail latency between these two crates matter in your use cases?
Also, FYI: This crate is going through a big revamp in #41 and the performance characteristics will almost surely change afterwards (for better or worse, but hopefully better).
Yes, it's slightly worse, although it was surprising to me that it was so close since the other library is more geared towards latency (as far as I know). In my use case this is a key metric and I'm using whatever is fastest, within reason.
Thanks for the heads up about the upcoming changes. I'm definitely watching this and the other crossbeam libraries with interest as they develop. Will post back with any big changes I notice. Had also wanted to check mpsc against crossbeam_channel head-to-head.
One other question, do you have any plans (or would you consider) offering a "broadcast"-type channel in the library directly? By broadcast I mean spmc (or mpmc), but each consumer gets every message. I'm not sure how common it is elsewhere but in my use cases it's frequent to need to send the same data to multiple threads, and it's been tougher to find good examples or info about how to do that the best way.
@jonathanstrong There's a simple broadcasting adapter in this PR: https://github.com/crossbeam-rs/crossbeam-channel/pull/33
Would something like that work for you?
@jonathanstrong
Just published version 0.2.0 of crossbeam-channel
, which brings noticeable performance improvements in my benchmarks. I wonder how it'd fare on your tail latency benchmark so if you could run it again, that'd be awesome!
may be a few days, but will do! thanks for the heads up.
results are in!
bounded
with capacity 8Vec
of Sender
s, and sends a copy of the time to each sequentiallymultiqueue::broadcast_queue
(mpmc) and std::mpsc
channels (mpsc with same Vec<Sender<_>>
approach as crossbeam). multiqueue (paging @schets) is based on the LMAX disruptor pattern and is designed with latency in mind specificallyFirst up, std::mpsc
. At first glance it had dramatically worse results than the other two libraries, with an ugly 2ms worst case:
However, this only happened once at the beginning of the run, which is pretty common when measuring latency (but didn't happen to the others). Hopefully any programs relying on std are allowed to warm up before money or lives are on the line. Here is the data with the spike excluded:
99.99% at 30u, with a worst case of ~100u. Not terrible.
Now, for our main event, crossbeam_channel vs multiqueue:
On worst case, multiqueue still edges out crossbeam by a bit, but crossbeam is arguably better across the board.
Excluding the worst spike for each:
Closeup of a distinct edge for crossbeam around 99%:
Both libraries show a persistent, up-to 10u edge over std::mpsc
from 90% on:
Raw hdrhistogram log data: crossbeam-v0.2-latency-bench.v2z.gz
But wait, there's more! Since the results were so close, I ran crossbeam and multiqueue for another 20 minutes each (around 950k messages):
Best look (worst spike removed for each):
The additional data generally confirms the first run. Crossbeam is a titch slower at the far tail end (possibly measurement noise), and a titch faster around 99%.
Raw hdrhistogram log data for the second run: crossbeam-v0.2-latency-bench-2.gz
Final notes: I plan on submitting the benchmark as a pull request at some point, but need to clean it up and untangle some proprietary code from it first. Note, however, it's unlikely you would get similar results running this on your laptop while doing other work.
Edit: realized after the fact that some of the charts label the unit as milliseconds. That's by mistake, all the measurements are in nanoseconds.
Nice benchmarks! One thing I noticed is that the multiqueue gist you posted here uses a blocking wait internally, so even though you only try_recv the senders still have to deal with that (albeit post-send, so it's more of a throughput thing).
I suspect the latency differences near the tails come about from minor implementation differences - multiqueue-broadcast for example has to explicitly track reader indices and writer-contention on reading/updating that internal view might be related to the difference around the 99th%, especially with such a small queue. I suspect that multiqueue winning at the tail comes about as a result of crossbeam-channel having to write to multiple receivers to achieve broadcast if I understand your benchmark right.
Multiqueue supports multiple writers, but most of the implementation is optimized for single writers with broadcast spsc using relatively large queue sizes.
I'm actually in the process of porting over some optimizations to crossbeam-channel that might give it the upper hand in your implementation
Hey,
Just did a benchmark that's a bit different from what you have published in the crate and thought it would be helpful to you to have the results. This is a latency test, under the following conditions:
try_recv
and are (basically) the only activity on the core.clone
theReceiver
, but then realized that messages would only go to one of theclone
d instances, so used aVec<Receiver<_>>
, spinning ontry_send
one at a time. (Note: the legend in the benchmark is wrong on this front, what was really tested was aVec
of mpsc channels.)chrono::DateTime<Utc>
, and when received, a receiving thread compares the sent time to a new call toUtc::now()
and records the duration in nanos with an HdrHistogram (there is some measurement noise here, but best method I have found so far).multiqueue
'sBroadcastSender
andBroadcastReceiver
for comparison.multiqueue
is modeled after lmax disruptor.multiqueue
.Here is a timeline/log histogram view of the results:
A couple other looks at the histogram:
Full results (should work with typical HdrHistogram log viewers when decompressed, "v2z" is extension I came up with, not sure what other people use): multiqueue-crossbeam-channel-tail-latency.v2z.gz
I appreciate that you guys have posted detailed benchmarks of your library compared to others as well as working on an improved channels implementation in general. Hope this helps you!