gerben-s / quicksort-blog-post

https://blog.reverberate.org/2020/05/29/hoares-rebuttal-bubble-sorts-comeback.html
Apache License 2.0
39 stars 7 forks source link

Bad behavior if all elements of array are equal #1

Closed gsjaardema closed 4 years ago

gsjaardema commented 4 years ago

I pulled this into my code to see how it compared to a sort implementation I was using and found that the tests were hanging. After a little investigation, it wasn't hanging but was just performing very badly for the cases where all elements are equal. Looks like it partiions into subarrays of size 1 and n-1 and continues that all the way down or until it blows up due to too much recursion.

I haven't looked at any fixes, but wanted to know whether that was a known issue with the experimental code or whether that behavior had been overlooked.

Although the test is easy to reproduce, here is the code for completeness:

#include <hybrid_qsort.h>
#include <vector>
int main()
{
  //  std::vector<int> v(131071,1);
  std::vector<int> v(1024,1);
  exp_gerbens::QuickSort(v.data(), v.size());

}
dumblob commented 4 years ago

@gsjaardema have you also tried other pathological inputs (e.g. as linked in the Andrei's blog post)?

gerben-s commented 4 years ago

This code is only focused as illustration on how to improve sort algorithms. This is not meant as a quality implementation. For that it needs the usual improvement to QuickSort. You can test it on unique elements and achieve good behavior except for special crafted input that also triggers N^2.

gsjaardema commented 4 years ago

OK, that makes sense.

gerben-s commented 4 years ago

I added a fallback to handle this case. I think that the code is now reasonably good to handle real world data. Of course it's not securely randomized so it's still possible to construct adversarial input to produce quadratic behavior, but duplicates should be handled well.

gsjaardema commented 4 years ago

Works much better on the all values equal case. I still get a large slowdown in my test code for the case known as "stagger" which is:

        case stagger:
          for (i=0; i < n; i++) {
            x[i] = (i * m + i) % n;
          }
          break;

When n = 131071 and m = 1 or 131072. Finishes, but definitely pauses at those points.

Not looking for a fix necessarily, just providing info.