crossbeam-rs / crossbeam

Tools for concurrent programming in Rust
Apache License 2.0
7.49k stars 470 forks source link

Performance degrades severely when the number of producers increases. #1119

Open bigKoki opened 5 months ago

bigKoki commented 5 months ago

In the performance test of MPSC, when simulating 2 and 4 producers respectively, Crossbeam's performance is much better than Java-Disruptor, but when simulating 8 and 16 producers, Crossbeam is worse than Java-Disruptor. My testing device has 16 cores and 64GB. May I ask what caused this problem?

Here are the results of two experiments, the first with 2 producers and the second with 8 producers. The abscissa represents the buffer size and the ordinate represents the time consumption.

7b75eea6-aa87-4059-9696-87270de80f8a f5b664d0-3a6f-40a6-8b87-d1aeb5d3053a

al8n commented 2 months ago

My answer may not be correct. AFAIK, if you are using the unbounded channel, its underlying is a linked list; each node can hold 32 elements. Let's say,

[block1(32 elements)] -> [block2(32 elements)]

If in your code, your channel now has 32 elements, so the first block is full, and you send a new element, a new node will be created (allocation required), and if the consumer consumes this element immediately, the second node will be destroyed. If this situation repeatedly happens, as Rust does not have a garbage collector, frequent alloc and dealloc may lead to slow performance.

bigKoki commented 2 months ago

My answer may not be correct. AFAIK, if you are using the unbounded channel, its underlying is a linked list; each node can hold 32 elements. Let's say,

[block1(32 elements)] -> [block2(32 elements)]

If in your code, your channel now has 32 elements, so the first block is full, and you send a new element, a new node will be created (allocation required), and if the consumer consumes this element immediately, the second node will be destroyed. If this situation repeatedly happens, as Rust does not have a garbage collector, frequent alloc and dealloc may lead to slow performance.

Thanks for your answer, I'm using a bounded channel, but your answer really made me learn.