Open hitblackjack opened 4 years ago
DeviceMatrix.cu line 381 if(tid < warpSize){ for (size_t shift = warpSIze; shift>0;shift>>=1) sdata[tid] += sdata[tid +shift];
without ”__syncthreads()“ here, the shared memory sdata should be set volatile.
reference: https://stackoverflow.com/questions/21205471/cuda-in-warp-reduction-and-volatile-keyword
DeviceMatrix.cu line 381 if(tid < warpSize){ for (size_t shift = warpSIze; shift>0;shift>>=1) sdata[tid] += sdata[tid +shift];
without ”__syncthreads()“ here, the shared memory sdata should be set volatile.
reference: https://stackoverflow.com/questions/21205471/cuda-in-warp-reduction-and-volatile-keyword