WojciechMula / simd-sort

AVX512F and AVX2 versions of quick sort
BSD 2-Clause "Simplified" License
105 stars 13 forks source link

Optimizing Numpy's quicksort by simd sort. #6

Open Qiyu8 opened 3 years ago

Qiyu8 commented 3 years ago

Hi, Thanks for your awesome work, The accelerate of quick sort is of great value in many areas, As a member of Numpy, I want to import simd sort algorithm here to Numpy by using universal intrinsics, as far as I can see, avx2_quicksort.cpp is the best candidate to integrated, @WojciechMula @lemire Any suggestions would be appreciated!

lemire commented 3 years ago

Can you elaborate on what you want to sort ? The data type is important.

WojciechMula commented 3 years ago

I'm afraid the algorithms in this repository are not in a ready-to-use state. The performance depends on data distribution, IIRC sorting large arrays of values from a tiny set (like 10k elements but only 7 different values) is slow.

Qiyu8 commented 3 years ago

@lemire The following data type is sorted in Numpy:

Numpy type C type Description
np.bool_ bool Boolean (True or False) stored as a byte
np.byte signed char Platform-defined
np.ubyte unsigned char Platform-defined
np.short short Platform-defined
np.ushort unsigned short Platform-defined
np.intc int Platform-defined
np.uintc unsigned int Platform-defined
np.int_ long Platform-defined
np.uint unsigned long Platform-defined
np.longlong long long Platform-defined
np.ulonglong unsigned long long Platform-defined
np.half / np.float16   Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
np.single float Platform-defined single precision float: typically sign bit, 8 bits exponent, 23 bits mantissa
np.double double Platform-defined double precision float: typically sign bit, 11 bits exponent, 52 bits mantissa.
np.longdouble long double Platform-defined extended-precision float
np.csingle float complex Complex number, represented by two single-precision floats (real and imaginary components)
np.cdouble double complex Complex number, represented by two double-precision floats (real and imaginary components).
np.clongdouble long double complex Complex number, represented by two extended-precision floats (real and imaginary components).

but I can set an allow-list to decide which data type is suitable for sorting, currently the simd sort is prepared for uint if I'm not mistaken.

@WojciechMula Thanks for your advice, there is no perfect algorithm in the world, quicksort is usually the fastest, but the worst case scenario is O(N^2) so in Numpy It's switches to the O(NlogN) worst case heapsort if not enough progress is made on the large side of the two quicksort partitions. This improves the worst case while still retaining the speed of quicksort for the common case, on the other hand, Numpy uses insert sorting for small arrays. Sorting large arrays of values from a tiny set is a special scenario that may handled separately, Do you have any plan to improve simd sort code to make it more practical? :smile:

WojciechMula commented 3 years ago

@Qiyu8 We don't actively work on this topic, it somehow faded out. But I wish we could do something. :)

BTW, you may find this project interesting: https://github.com/damageboy/vxsort-cpp