Morwenn / cpp-sort

Sorting algorithms & related tools for C++14
MIT License
621 stars 57 forks source link

ska_sort + Comparison Lambda #193

Closed victorstewart closed 2 years ago

victorstewart commented 2 years ago

any reason why ska_sort doesn't support a custom Comparison function?

i saw this issue on the ska_sort page mentioning it https://github.com/skarupke/ska_sort/issues/13

Morwenn commented 2 years ago

Yeah, ska_sort is simply not a comparison sort, but a radix sort derivative. As such, it requires more information about the elements than "how to compare two elements?" in order to sort them: namely it requires to know how to decompose the elements into parts (shifting for integers, indexing for arrays, etc.) in order to put them into buckets.

It seems that ska_sort works in O(n) time most of the time, which is already a big hint: comparison sorts can't do better than O(n log n) on average (even though they can run in O(n) for specific inputs). The original article describing ska_sort says in its FAQ:

So the worst case for radix sort is O(n*b) where b is the number of bytes that I have to read until I can tell all the elements apart. If you make me sort a lot of long strings, then the number of bytes can be quite large and radix sort may be slow.

This kind of complexity alone means it can't be a comparison sort, hence the lack of support for custom comparisons.


What the linked issue proposes isn't a comparison function since it doesn't compare pairs of elements, but a different kind of customization object to control the bucket classification. As to why I don't provide such a feature, there are several reasons:

Concerning the last point, I could theoretically add more customization points like this one besides comparisons and projections, but I would need to design it another way if I don't want to make the library more unmaintainable that it already is haha

victorstewart commented 2 years ago

thank you for the deeply thorough response! makes total sense now... by definition radix sorts are non-comparative.

any advise on the most performant non-stable sorter then? i'd teased out pdq_sort from the non-stable benchmarks.

i was trying to sort integral types within a class from greatest to least, so i decided to trick it by using negative values. but other cases where the comparisons are more complex.

Morwenn commented 2 years ago

If you're looking for the fastest unstable comparison sort in the library, then pdq_sort is probably the best you'll get generally. If you've got specificities, other sorters might be better but you'll and to test them yourself, there's unfortunately no silver bullet.

If you're trying to sort "by key", you can still use ska_sort with projections though:

struct MyStruct
{
    // Field you want to sort a collection of MyStruct by
    int key;
};

std::vector<MyStruct> vec = { /* ... */ };
cppsort::ska_sort(vec, &MyStruct::key);

Here I use a pointer to data member as a projection, but in this example you use any function taking a MyStruct as a parameter and returning a type sortable with ska_sort as a projection.

Morwenn commented 2 years ago

I'm going to close this issue for now. There are no plans to implement the proposed feature today, maybe in a distant future.