Morwenn / cpp-sort

Sorting algorithms & related tools for C++14
MIT License
621 stars 57 forks source link

slab_sort fails to sort some inputs #211

Closed Morwenn closed 2 years ago

Morwenn commented 2 years ago

It somehow had not occurred before, but the new split_adapter tests occasionally fail with slab_sort and no other sorter. I have to investigate further, but so far I observed that, after the first split_adapter pass, the left part of the collection is correctly sorter so it seems unlikely that the issue comes from split_adapter.

Next step is: isolate a failing split_sorter(slab_sort) input, check that everything is normal with the right part of the input after the first split_adapter pass, then run slab_sort alone on it to confirm that the issue really comes from there. Once it is confirmed, find the bug in slab_sort and fix it.

Morwenn commented 2 years ago

I can reproduce the bug with the following test case:

std::list<int> collection = {
    391,392,397,237,9,238,284,56,376,16,141,266,255,126,245,34,95,146,251,301,336,241,
    149,110,255,215,257,38,147,294,263,302,67,85,265,253,32,264,267,42,151,92,155,152,
    395,18,161,296,163,329,63,166,177,100,169,259,375,199,213,38,237,388,98,294,41,178,
    4,176,390,108,128,187,13,304,185,295,43,188,256,72,242,312,199,250,313,314,21,116,
    195,366,300,20,323,118,264,55,77,285,331,392,164,208,186,328,329,379,5,12,205,265,
    207,388,319,127,209,282,328,346,347,126,24,83,57,354,211,148,357,212,8,216,157,362,
    188,83,131,366,27,106,131,147,68,370,326,374,94,376,371,289,225,243,227,274,188,384,
    188,228,28,232,325,86,100,187,393,363,395,355,253
};

There are a few inversions somewhere in the result, where the sequence 188,188,195,188,188 appears.

It isn't really minimal, yet it is enough to draw the following conclusions:

Considering that several components are reused in other algorithms and have not failed so far, the most likely candidates for failure are:

Morwenn commented 2 years ago

As mentioned in the previous commit message, the bug was that the algorithm used stable_partition to partition the collection around the median, but then recursed in the two halves of the collection instead of recursing into the partitions created by stable_partition. The elements in the resulting left are smaller than the media, and those in the right partition are greater or equal, which happens to match the "split in two halves" behaviour when all elements are distinct - this explains why the bug wasn't encountered before. When several elements compare equivalent to the median, those partitions differ.

The algorithm was changed to recurse in the partitions created by stable_partition, which fixed the bug.