Closed arthurprs closed 7 years ago
Thanks Arthur. We have added a simple unit test for this at https://github.com/couchbase/forestdb/blob/master/tests/unit/mempool_test.cc but what we observe is that with your patch, there is nearly a 2X slowdown compared to the existing implementation with std::queue. I am not sure if this slowdown is due to the multiplication overhead or from the underlying STL implementation or something else. I am on OS X with AppleClang 7.3.0.7030031 Please feel free to add an unit test if you find that the stack based implementation shows better behavior with locality.
That's very good, I wanted some tests while writing this. Although, I think they're not really good as 97+% of the cpu time is spend inside rand() and the fill pattern it's not deterministic.
I updated the code a bit (second commit in this PR) trying to make it more repeatable and a bit less artificial. I had to increase the iterations a lot to mask the noise from spawning/joining the threads and the fill amount shouldn't be too long otherwise it's impossible to measure what we really want to measure.
stack
➜ build git:(master) ✗ tests/unit/mempool_test
basic test with 500000 runs of 8 bins x 1048576bytes in 391647 us PASSED
multi-thread test with 8 threads each with 500000 runs of 8 bins x 1048576bytes in 1107800 usec PASSED
queue
➜ build git:(bebf491) ✗ tests/unit/mempool_test
basic test with 500000 runs of 8 bins x 1048576bytes in 437160 us PASSED
multi-thread test with 8 threads each with 500000 runs of 8 bins x 1048576bytes in 1199590 usec PASSED
This is probably the best case scenario for deque as most stdlib implementations will fit 8 items per node just fine and won't cause any malloc/free. Also, it hardly stress locality, as the benchmark quickly puts everything into L3 and L2.
Thanks for the patch, Arthur! If this is your first code contribution to Couchbase, you'll need to fill out our Contributor License Agreement (http://review.couchbase.org/#/settings/agreements), which basically just declares that you created the code and are allowing us to release it under the Apache license.
I just filled it and rebased the patch.
Are you still interested in this? If yes I can rebase it and fix the conflicts.
Yes, we are, sorry for the delay, Arthur. As it turns out I don't have the permission to merge pull requests. But I have notified the ones who do. Thanks again
I've rebased the commits on top of master.
Improve cache locality by using a stack for indexes and a continuous block of memory.
Although it now requires a multiplication (at least preserving this API) the operations on the underlining vector are faster and have better locality than the previous underlining deque.