Poor performance compared to cppsort::verge_sort()

andyg24 commented 4 years ago

Hello Morwenn,

First, thank you for your excellent work on these sort implementations.

For some reason, when I populate a vector with 100 million random uint32_t values, and sort it using

vergesort::vergesort(data.begin(), data.end(), std::less())

the runtime is about 3x greater than with

cppsort::verge_sort(data, std::less())

(10.2 seconds vs 3.4 seconds, gcc version 9.3.0, -O3). At a first glance, the two implementations should be very similar.

Is there a reason for the large discrepancy? The standalone vergesort would be preferable in my case as it comes in a single .h file and would be easier to integrate with my project.

Thanks!

Morwenn commented 4 years ago

The difference surely comes from the fact that cpp-sort reimplements a bunch of standard library algorithms, has a lot of additional micro-optimizations that might improve integer sorting, and takes advantage of C++11 and C++14 features, and even newer standard features when possible — I would have a hard time .

It could be possible to do it for the standalone version of vergesort, it would add a significant amount of code which I didn't feel was worth it for a standalone project: if you at the cpp-sort source code you will see that almost every standard library algorithm used is reimplemented. To be honest this standalone version is more of a demonstrator with different goals: I always try to maximize the execution speed in cpp-sort, while the standalone sorting algorithm projects tend to favour a reasonable code size and some backward compatibility, sometimes up to C++03. I didn't really expect that anyone would try to use it in a serious project.

What kind of pattern does you data have prior to sorting? Depending on the answer, the speed difference can be explained differently.

Morwenn commented 4 years ago

The explanation was probably way more stupid than I originally thought: the random-access vergesort overload in the single-header version was called not_vergesort, which is a leftover of recent tests that I used to test the bidirectional version on random-access collections. So obviously the random-access vergesort was never called, and the bidirectional version was used instead. The bidirectional overload can't "jump" through the collection like the random-access overload does, which might explain most of the speed difference you experienced.

andyg24 commented 4 years ago

The data was populated with random integers: data.push_back(rand());

The new version now also runs in about 3.5 seconds, or 3x faster than before! Thank you very much for the quick fix.

Morwenn commented 4 years ago

Ok, I'm glad it was just that :)

It somehow slipped under my radar because it was correct on my computer but otherwise acted as if correctly synchronized with the GitHub version for some reason, despite them being different.

I'm closing this issue, thanks a lot for the catch!

Morwenn commented 4 years ago

By the way if you want more trivia about the speed difference: basically the random-access vergesort skips O(log n) elements when it finds sorted runs that aren't big enough to be considered (sorted or reverse-sorted runs with fewer than log n elements), but the bidirectional vergesort has to scan linearly. When the collection is truly shuffled you can expect the random-access vergesort to notice really fast that there aren't enough sorted elements and basically to perform log n jumps really fast, which means that it only considers O(log n) elements before switching to pdqsort, while the bidirectional vergesort has to consider O(n) elements either way then falls back to a quicksort which isn't as optimized as pdqsort can be.

This roughly explains your speed difference: more time spent in the pre-sorting phase before switching to a slower algorithm.

Morwenn / vergesort

Poor performance compared to cppsort::verge_sort() #13