Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

libc++ std::sort has O(n^2) worst case, standard mandates O(N log(N)) #20836

Closed Quuxplusone closed 2 years ago

Quuxplusone commented 10 years ago
Bugzilla Link PR20837
Status RESOLVED FIXED
Importance P normal
Reported by Orson Peters (orsonpeters@gmail.com)
Reported on 2014-09-02 13:40:12 -0700
Last modified on 2021-11-23 04:42:54 -0800
Version unspecified
Hardware All All
CC dushistov@gmail.com, eric@efcs.ca, fang@csl.cornell.edu, hfinkel@anl.gov, hiraditya@msn.com, jsweval@arxan.com, kutdanila@yandex.ru, llvm-bugs@lists.llvm.org, mclow.lists@gmail.com, nicolasweber@gmx.de, sebpop@gmail.com, zamazan4ik@tut.by
Fixed by commit(s)
Attachments main.cpp (1179 bytes, text/plain)
stdsort_introspective.patch (3759 bytes, text/plain)
Blocks
Blocked by
See also

Created attachment 12978 Evidence for quadratic worst case.

The C++ standard mandates that std::sort has complexity O(N log(N)), but libc++'s implementation takes O(N^2) in the worst case.

As proof I've attached a program that constructs a worst case input for several sizes. It also instruments the number of comparisons used to sort these worst cases and prints the results. The technique used is described in the 1999 paper "A Killer Adversary for Quicksort" by M. D. McIlroy.

Compiling and running:

$ clang++ -O2 -m64 -march=native -std=c++11 -stdlib=libc++ main.cpp -nodefaultlibs -lc++ -lc++abi -lm -lc -lgcc_s -lgcc && ./a.out N: comparisons 100: 2479 200: 10229 400: 40729 800: 161729 1600: 448698 3200: 1413898 6400: 5264298

This is clearly quadratic. To illustrate this defect more, these are the comparison counts given by compiling using libstdc++:

$ clang++ -O2 -m64 -march=native -std=c++11 main.cpp && ./a.out N: comparisons 100: 1742 200: 4260 400: 10035 800: 22784 1600: 51049 3200: 112386 6400: 244850

Inspecting the source of shows the cause of the issue: there is no introsort implemented, only quicksort (albeit with non-trivial optimizations, but nothing preventing the worst case). I've written a patch that augments the current implementation to make it work as an introspective sort, switching to heapsort if the recursion depth exceeds 2*floor(log(N)). This post can only have one attachment, so I'll post it as an attachment to a comment.

Having not contributed to libc++ before I'm not 100% familiar with all coding styles/(un)written rules. My changes are rather minimal though, so I've just followed what style could be found in context. And of course I knowingly submit my code under the libc++ licenses, the MIT license and the UIUC License.

Quuxplusone commented 10 years ago

Attached main.cpp (1179 bytes, text/plain): Evidence for quadratic worst case.

Quuxplusone commented 10 years ago

Attached stdsort_introspective.patch (3759 bytes, text/plain): A patch that fixes the quadratic behaviour by implementing introsort.

Quuxplusone commented 10 years ago

Hi Orson,

Please mail this patch to the cfe-commits list for review. Please make sure the patch is sent as an attachment to the e-mail (not inline text), and please make sure the repeat the background information you've included here in the e-mail itself.

For more information on the development policy and workflow, please see: http://llvm.org/docs/DeveloperPolicy.html

Thanks!

Quuxplusone commented 10 years ago
Hi Hal,

I've sent the patch to the mailing list, where it is waiting for moderator
approval since I'm not a member of the list.

Greetings,

Orson Peters
Quuxplusone commented 9 years ago

An update. EricWF is working on a set of benchmarks for std::sort and other algorithms/containers. He expects to finish this quite soon, and I'll be using that to make sure this doesn't introduce a performance regression - and will be adding this to the test suite.

So I haven't forgotten about it.

Quuxplusone commented 8 years ago
(In reply to comment #4)
> An update. EricWF is working on a set of benchmarks for std::sort and other
> algorithms/containers. He expects to finish this quite soon, and I'll be
> using that to make sure this doesn't introduce a performance regression -
> and will be adding this to the test suite.
>
> So I haven't forgotten about it.

What's the status on this?
Quuxplusone commented 8 years ago
I think std::sort() is still broken in trunk as of today.  std::sort() should
not call the compare function on the same element: it cannot be O(N log N) if
it does.

Related bug: https://llvm.org/bugs/show_bug.cgi?id=28551

Reduced testcase:
$ cat a.cc
#include <algorithm>
#include <vector>
#include <cassert>

int main(int argc, char** argv) {
  int size = 100;
  std::vector<int> v(size);

  // Elements in v are unique.
  for (int i = 0; i < size; ++i)
    v[i] = i;

  std::sort(v.begin(), v.end(),
            [&](int x, int y) { assert(x != y); return x < y; });

  return 0;
}
$ clang++ -std=c++11 -stdlib=libc++ a.cc -o a.out
$ ./a.out
a.out: a.cc:14: auto main(int, char **)::(anonymous class)::operator()(int,
int) const: Assertion `x != y' failed.
./go.sh: line 5: 19447 Aborted                 (core dumped) ./a.out

$ clang++ -std=c++11 -stdlib=libstdc++ a.cc -o a.out
$ ./a.out
Works fine with libstdc++.
Quuxplusone commented 8 years ago

_Bug 28551 has been marked as a duplicate of this bug._

Quuxplusone commented 8 years ago
I would like to point out that our std::sort implementation is between 2x and
5x faster on already sorted inputs than libstdc++ is.

Some benchmark results:
Run on (4 X 4228.3 MHz CPU s)

// libc++ //
Benchmark                                       Time           CPU Iterations
-----------------------------------------------------------------------------
BM_Sort/random_uint32/1024                 170614 ns     174629 ns       4375
BM_Sort/sorted_ascending_uint32/1024        12902 ns      13276 ns      53030
BM_Sort/sorted_descending_uint32/1024       20919 ns      21712 ns      29661
BM_Sort/single_element_uint32/1024          13747 ns      14384 ns      60345
BM_Sort/random_strings/1024               1097028 ns     998514 ns        673
BM_Sort/sorted_ascending_strings/1024      233688 ns     208469 ns       3070
BM_Sort/sorted_descending_strings/1024     341642 ns     361934 ns       1923
BM_Sort/single_element_strings/1024       1081605 ns    1126143 ns        547

// libstdc++ //
Benchmark                                       Time           CPU Iterations
-----------------------------------------------------------------------------
BM_Sort/random_uint32/1024                 156709 ns     161200 ns       4268
BM_Sort/sorted_ascending_uint32/1024        49431 ns      53600 ns      10000
BM_Sort/sorted_descending_uint32/1024       42401 ns      41201 ns      16990
BM_Sort/single_element_uint32/1024          32115 ns      30892 ns      20588
BM_Sort/random_strings/1024                878349 ns     796434 ns        673
BM_Sort/sorted_ascending_strings/1024      576253 ns     532000 ns       1000
BM_Sort/sorted_descending_strings/1024     506059 ns     368390 ns       1683
BM_Sort/single_element_strings/1024       2453953 ns    2542751 ns        269

The benchmarks used can be found here: https://reviews.llvm.org/D22240
Quuxplusone commented 8 years ago

Eric Fiselier, can you run that same benchmark with the 'block' branch of my pattern-defeating quicksort algorithm? https://github.com/orlp/pdqsort/tree/block

I've been waiting for a while now on this benchmark suite to start discussion of integrating the algorithm to libc++, I wasn't aware you finished the benchmark.

Quuxplusone commented 8 years ago
(In reply to comment #8)
> Benchmark                                       Time           CPU Iterations
> -----------------------------------------------------------------------------
> BM_Sort/random_uint32/1024                 170614 ns     174629 ns       4375
> BM_Sort/sorted_ascending_uint32/1024        12902 ns      13276 ns      53030
[...]

Could you please measure the number of instructions executed, data reads, and
data writes, instead of the number of nano seconds of execution?  I think those
are a more stable metrics than the execution time that may vary with the state
of the machine, daemons, noise, etc.

$ valgrind --tool=cachegrind ./a.out
$ cg_annotate cachegrind.out
Quuxplusone commented 6 years ago

I double Orson's request about using pdqsort as internal algorithm for std::sort. You can see here, how pdqsort is fast (https://github.com/orlp/pdqsort).

Quuxplusone commented 2 years ago

Fixed with https://reviews.llvm.org/D113413, should land in llvm 14

If you have stability sorting guarantees that you cannot fix and need migration, use -D_LIBCPP_DEBUG_RANDOMIZE_UNSPECIFIED_STABILITY or -D_LIBCPP_DEBUG=1 from https://reviews.llvm.org/D96946 to randomize the input range before sorting