Use GPU Radix sort when available

cosmo-team / cosmo-issues

Issue repository for Cosmo (separate until we can transfer issues between repositories nicely - ignore code)

http://www.github.com/cosmo-team/cosmo

GNU General Public License v3.0

0 stars 0 forks source link

Use GPU Radix sort when available #34

Open alexbowe opened 10 years ago

alexbowe commented 10 years ago

Use CUB (http://nvlabs.github.io/cub/) for GPGPU, Thrust for other things (TBB, OpenMP if GPGPU not available):

https://github.com/thrust/thrust/wiki/Device-Backendshttps://github.com/thrust/thrust/wiki/Host-Backendshttps://github.com/thrust/thrust/wiki/Direct-System-Access

Probably easy enough to support threaded radix sort on multicore cpu (and gpu, but not as fast as CUB) as a first step.

alexbowe commented 10 years ago

In Thrust should work like this: https://github.com/thrust/thrust/blob/master/examples/lexicographical_sort.cu but should be tested against this: https://github.com/thrust/thrust/blob/master/examples/sort.cu (Because they are integers as it is...)

alexbowe commented 10 years ago

A cool side effect of using the permutation method in the lexicographical sort example above is that a minimum of changes need to occur to permute the counts as well (for graph correction).

flag ideas: --quantize q (to quantize the counts to q values), --with-counts (defaults to keep actual counts) Needs a flag to help graph construction etc...

alexbowe commented 8 years ago

Need to modify STXXL to support sorting hooks.

Would be nice to compare this against Megahit (which uses GPU sorting, but doesn't use STXXL, so has an input limit dependent on RAM [I think], but also partitions runs differently to how STXXL does).

alexbowe commented 8 years ago

First version should just use the same amount of RAM as the GPU has, but later if the user specifies more ram than the vidcard has, should set up a circular buffer so we can keep filling up system memory while the GPU sorts runs (probably requires non-trivial changes to STXXL).