Optimizing Numpy's quicksort by simd sort.

Qiyu8 commented 3 years ago

Hi, Thanks for your awesome work, The accelerate of quick sort is of great value in many areas, As a member of Numpy, I want to import simd sort algorithm here to Numpy by using universal intrinsics, as far as I can see, avx2_quicksort.cpp is the best candidate to integrated, @WojciechMula @lemire Any suggestions would be appreciated!

lemire commented 3 years ago

Can you elaborate on what you want to sort ? The data type is important.

WojciechMula commented 3 years ago

I'm afraid the algorithms in this repository are not in a ready-to-use state. The performance depends on data distribution, IIRC sorting large arrays of values from a tiny set (like 10k elements but only 7 different values) is slow.

Qiyu8 commented 3 years ago

@lemire The following data type is sorted in Numpy:

Numpy type	C type	Description
np.bool_	bool	Boolean (True or False) stored as a byte
np.byte	signed char	Platform-defined
np.ubyte	unsigned char	Platform-defined
np.short	short	Platform-defined
np.ushort	unsigned short	Platform-defined
np.intc	int	Platform-defined
np.uintc	unsigned int	Platform-defined
np.int_	long	Platform-defined
np.uint	unsigned long	Platform-defined
np.longlong	long long	Platform-defined
np.ulonglong	unsigned long long	Platform-defined
np.half / np.float16		Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
np.single	float	Platform-defined single precision float: typically sign bit, 8 bits exponent, 23 bits mantissa
np.double	double	Platform-defined double precision float: typically sign bit, 11 bits exponent, 52 bits mantissa.
np.longdouble	long double	Platform-defined extended-precision float
np.csingle	float complex	Complex number, represented by two single-precision floats (real and imaginary components)
np.cdouble	double complex	Complex number, represented by two double-precision floats (real and imaginary components).
np.clongdouble	long double complex	Complex number, represented by two extended-precision floats (real and imaginary components).

but I can set an allow-list to decide which data type is suitable for sorting, currently the simd sort is prepared for uint if I'm not mistaken.

@WojciechMula Thanks for your advice, there is no perfect algorithm in the world, quicksort is usually the fastest, but the worst case scenario is O(N^2) so in Numpy It's switches to the O(NlogN) worst case heapsort if not enough progress is made on the large side of the two quicksort partitions. This improves the worst case while still retaining the speed of quicksort for the common case, on the other hand, Numpy uses insert sorting for small arrays. Sorting large arrays of values from a tiny set is a special scenario that may handled separately, Do you have any plan to improve simd sort code to make it more practical? :smile:

WojciechMula commented 3 years ago

@Qiyu8 We don't actively work on this topic, it somehow faded out. But I wish we could do something. :)

BTW, you may find this project interesting: https://github.com/damageboy/vxsort-cpp

WojciechMula / simd-sort

Optimizing Numpy's quicksort by simd sort. #6