byzhang / cudpp

Automatically exported from code.google.com/p/cudpp
Other
0 stars 0 forks source link

CUDPP 2.1: Use Duane Merrill's sort directly rather than Thrust::sort #93

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Thrust::sort is nice, but it doesn't expose all of the performance available in 
Merrill's sort because it doesn't pre-allocate or pre-configure anything.  So 
the allocation and configuration overhead is significant for sorts < 1M 
elements.  By using Duane's B40C code directly, we can preconfigure/allocate 
temporaries like we used to.  This should be faster for small sorts.

Original issue reported on code.google.com by harr...@gmail.com on 26 Jul 2011 at 10:25

GoogleCodeExporter commented 9 years ago
I'll handle this one.

Original comment by harr...@gmail.com on 26 Jul 2011 at 10:25

GoogleCodeExporter commented 9 years ago
You might also want to check out Sean Baxter's sort, which seems to be a bit 
faster than B40C:
http://www.moderngpu.com/sort/mgpusort.html

Original comment by m0b...@gmail.com on 29 Feb 2012 at 4:08

GoogleCodeExporter commented 9 years ago
Sean's work is great, I'm definitely following it. But, from a code 
maintenance point of view, thrust is well-supported by NVIDIA and is 
likely to be the performance leader for the long term (rather, it's 
got a lot more engineering resources behind it compared to one really 
capably guy), so it's probably the best long-term solution for the 
project. 

Original comment by jow...@gmail.com on 29 Feb 2012 at 7:28