NVIDIA / cub

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
BSD 3-Clause "New" or "Revised" License
1.68k stars 447 forks source link

Support for custom types in cub::DeviceRadixSort? #52

Closed maltese closed 1 year ago

maltese commented 8 years ago

I couldn't find a better place to ask this question, so pardon me for asking here:

Is there a way to use keys of non-standard types for cub::DeviceRadixSort?

So far I managed to sort arrays of ints, floats, doubles, longs, bools, etc.

But what I would like to sort an array where each item is a custom type:

typedef struct {
  char data[32]
} LongLongLong;

Unfortunately I get the following error when I try to do that:

../cub/device/dispatch/dispatch_radix_sort.cuh(254): error: class "cub::Traits<LongLongLong>" has no member "UnsignedBits"

Should I maybe add a NumericTraits for my custom type? How would I go about that?

Thank you.

maltese commented 8 years ago

What we are basically trying to do is to sort 32-byte long strings. Would it be possible?

3gx commented 8 years ago

Sorting 32 byte strings will take a lot longer than sorting 4 byte ints with radix sort. 8x more bits to sort and 8x more data to move around per pass , netting to 64x increase of work when compared to int

Comparison sort should fare better in this case. work increase is 8x (modulo comparison operator), because 8x more data will have to be moved around per pass, but the number of passes don't depend on the data type.

Consider thrust::sort for this, which uses merge sort for non primitive data types, but you'll need to provide (optimized) comparison operator. It should work but it won't be fast.

maltese commented 8 years ago

Thank you egaburov for the reply. I understand the performance reasoning.

However, an important thing that I forgot to mention is that I would only need to use a 4 byte substring piece of the whole 32 byte string to sort the list. Therefore we do not have 8x more bits to sort, but only 8x more data to move around.

Would this be possible?

Regardless, would you still go with merge sort instead?

Another thing I could think of is to add one pass before the sorting to create a list of just these 4byte substrings along with the indices pointing to the original 32byte string. After that I would scatter the 32byte strings according to the sorted data.

Would appreciate to hear your opinion.

3gx commented 8 years ago

Your proposed solution sounds okay to me, if char[4] can be treated as unsigned int. Use SortPair in this case, with the value in (key,value) pair being either the corresponding index or the whole 32 byte string that will be move around at ever pass. Try both and see which has lower runtime in your case.

daktfi commented 7 years ago

Let me bump this a little. Is there any way to sort with key being some custom very wide integer type, like 128 bit or so? I do know how the radix sort algorithm works, and know that there are no problem algorithm-wise, only need for some extra support from/for such custom type. I'm not all that good with modern c++ features (too much old-school low-level hardware coding), so not quite grasp what is needed and how to better implement it. BTW, there are very interesting types already: (u)int3, (u)int4 and (u)long3 and (u)long4 - these can be used for multiprecision calculations and if they can be used as key to sorting... Well, we need to declare the order of components, but that is very small restriction given the possible benefits.

sh1ng commented 7 years ago

👍 for RadixSort with custom type. In my case it would be nice to sort by two fields int and float. So I'm created the structure

{
int fields1,
float fields2
}

but I'm getting the same error.

gevtushenko commented 1 year ago

This issue is addressed by https://github.com/NVIDIA/cub/pull/671