Closed JPenuchot closed 5 years ago
Examples are on the way, I'll tidy them up into unit tests.
Test is located in blazetest/utiltest/algorithms/cuda_reduce.h
To run it:
cd blazetest
make src/utiltest/algorithms/cuda_reduce.run
CUDATransform seems to be broken too, I'll have a look into it
Fixed
blaze::CUDAReduce
doesn't work for large sizes. I've been unable to find the source of the bug for days now and I'm running out of ideas.Above a certain threshold the CUDA reduce kernel (the
__global__
function) will start outputting inaccurate values. I've been trying to pinpoint the issue, to add synchronization directives but nothing seems to help.