NULLx76 / ringbuffer

A fixed-size circular buffer written in Rust.
https://crates.io/crates/ringbuffer
MIT License
92 stars 18 forks source link

How can we optimize the ring buffer performance ? #123

Closed codehobbyist06 closed 1 year ago

codehobbyist06 commented 1 year ago

Hi everyone, I have been trying to compare performance of dpdk rings to AllocRingbuffer and have noticed that the results are significantly different,

Following are some stats: buffer No. of cycles to enqueue No. of cycles to dequeue dpdk ring 4 6 rust ring 41 68

So, as you can see there is difference of approximately 10 times, which makes it significantly slower. Hence, I wanted to know, if there is any way to optimize the ring buffer performance further, by using some flags etc. ? Also, one difference the AllocRingBuffer has with the dpdk ring is that it does not have any bulk enqueue or dequeue APIs. So, is there any plan of having such APIs implemented?

jdonszelmann commented 1 year ago

Hi!

I am curious how you measured these statistics. not at all because I don't believe them, I'm certain there are ways to still optimise ringbuffer. Did you use our own benchmark and extrapolate that to number of cycles or did you measure it yourself? If so, what compiler flags did you use for that?

Looking at dpdk they are even atomic (i.e. they work in multithreaded contexts). Although I'm sure they optimised that very well, I'm surprised that that's faster (especially when congested) than our few bit operations, a single branch (of which they seem to have plenty as well) and a write. Though maybe I'm looking at the wrong code. Which code did you benchmark?

About bulk insert apis: possibly! We're developing ringbuffer extremely part time so it's something I'd love to do if I find some time but it could take another month or so and I can't promise much. I made an issue for it #124

codehobbyist06 commented 1 year ago

I have measured the statistics by simply fetching the time register value before and after the operations. Also, both the rings have been compared in the same environment so I doubt, something could be wrong there. And regarding flags, actually I am relatively a bit new to rust so have not tried using any optimization flags as such. However, would like to know if I can use any such flags for better performance. Also, the enqueue and deque operations are happening in different dpdk threads for the benchmarking, so that both the rings are on the same page. I would really appreciate your inputs on how the ring performance could be optimized.

jdonszelmann commented 1 year ago

I see, well the first thing you can try is to compile with --release if you didn't do that yet. That might make a large difference, though it depends a bit on what you've tried already what further advice I can give

codehobbyist06 commented 1 year ago

Yes, I had compiled with --release flag for benchmarking. Also, apart from this, I did not use any optimization flags on rust side. The optimization setting are mostly the ones provided by default.

jdonszelmann commented 1 year ago

--release will optimise with optimisation level 3 so that's good. Another option you can try is enabling LTO

codehobbyist06 commented 1 year ago

Ohk sure. I will try with that. Also, is there any performance optimization available from the ring side. For e.g : some ring configurations that could be used?

jdonszelmann commented 1 year ago

There is not any configuration you can pass to a ringbuffer to make it faster. ConstGenericRingBuffer is a bit faster than AllocRingBuffer but it depends on your usecase if that's useful to you. One more thing to know if that RingBuffer stores full elements, not references to elements. If your elements are large, they're copied around into and out of the RingBuffer. You could have a ringbuffer of 'static references (or whichever lifetime available to you) and your performance may change.

codehobbyist06 commented 1 year ago

Ok sure. I will check the ConstGenericRingBuffer once if that could be useful. Thanks a lot for the info:)

Also, I wanted to keep static references in the ring buffer, but since the ring needs to have ownership of the objects (its a requirement on my side) it contains, I don't think I can do much on that part. However, would be open to know if there is some way to transfer the ownership of the objects to the ring as well as avoid copying the complete data chunk back and forth.

jdonszelmann commented 1 year ago

How do you do that in your C version? If I saw it correctly, it also mainly stores pointers or not?

jdonszelmann commented 1 year ago

With Box you can allocate first (which is expensive) and then pass only references around. If you're passing around the same references over and over again that may be worth it. You can also allocate in other places. Not on the heap, but maybe in an arena/bump allocator

jdonszelmann commented 1 year ago

If your references are really static, ringbuffer doesn't need ownership. It just depends on the generic type you're using. If you make a RingBuffer::<&'static T> RingBuffer doesn't need ownership. It needs references

codehobbyist06 commented 1 year ago

Yes, the C version also stores pointers, but my application is a bit different, in rust I have wrapped objects around those pointers that are being passed around. Also, I guess I can consider the point of just passing references around and keeping the ownership of the objects in some dump collector. Thanks a lot for your inputs.

jdonszelmann commented 1 year ago

no worries!