Open atadkase opened 4 years ago
Currently cudaextender's output compression is synchronous. Explore dynamic parallelism or kernel replacement for Thrust's stable sort for making the tail end of cudaextender truly async.
Currently cudaextender's output compression is synchronous. Explore dynamic parallelism or kernel replacement for Thrust's stable sort for making the tail end of cudaextender truly async.