Performance against quicksort for large datasets?

Maxime2 / heapsort

heapsort implementations

BSD 2-Clause "Simplified" License

0 stars 4 forks source link

Thanks for making this available, and your blog post of course. I've been watching Robert Sedgewick's lectures on Algorithms (which is how I stumbled upon your blog :blush:), and while he praises the benefits of heapsort, he also mentions as a downside that it "makes poor use of cache memory". More specific:

... doesn't have local memory reference like quicksort does (always referring to something that's nearby something else that was just referred to), whereas heapsort is going to look far away from the current place as it goes down the tree, and that makes it slower in a lot of situations.

On your blog post you do mention it beats quicksort for certain input sizes. Have you tested it for very large datasets (spanning several pages of memory)? Very interested to know if its limited locality of reference has a non-trivial impact on performance.

I benchmarked the C++ version of bottom-up heapsort against libc++'s std::make_heap + std::sort_heap, but bottom-up heapsort was always slower for an std::vector<int> of one million elements. Considering that the modern quicksort variants are way faster than libc++'s heapsort (cache memory does indeed matter), it seems that the « beats quicksort for certain inpput sizes » isn't true anymore and that bottom-up heapsort, while somewhat interesting, does not have much of a future :/

Since bottom-up heapsort always goes to the end of the special path to find the special leaf, I guess that it has to reload the cache more often for big collections compared to the regular heapsort that lazily goes deeper and deeper. With modern computers, cache-friendliness seems to be the game changer :p

Maxime2 / heapsort

Performance against quicksort for large datasets? #1