Open alexbowe opened 10 years ago
In Thrust should work like this: https://github.com/thrust/thrust/blob/master/examples/lexicographical_sort.cu but should be tested against this: https://github.com/thrust/thrust/blob/master/examples/sort.cu (Because they are integers as it is...)
A cool side effect of using the permutation method in the lexicographical sort example above is that a minimum of changes need to occur to permute the counts as well (for graph correction).
flag ideas: --quantize q (to quantize the counts to q values), --with-counts (defaults to keep actual counts) Needs a flag to help graph construction etc...
Need to modify STXXL to support sorting hooks.
Would be nice to compare this against Megahit (which uses GPU sorting, but doesn't use STXXL, so has an input limit dependent on RAM [I think], but also partitions runs differently to how STXXL does).
First version should just use the same amount of RAM as the GPU has, but later if the user specifies more ram than the vidcard has, should set up a circular buffer so we can keep filling up system memory while the GPU sorts runs (probably requires non-trivial changes to STXXL).
Use CUB (http://nvlabs.github.io/cub/) for GPGPU, Thrust for other things (TBB, OpenMP if GPGPU not available):
https://github.com/thrust/thrust/wiki/Device-Backendshttps://github.com/thrust/thrust/wiki/Host-Backendshttps://github.com/thrust/thrust/wiki/Direct-System-Access
Probably easy enough to support threaded radix sort on multicore cpu (and gpu, but not as fast as CUB) as a first step.