Open ferdymercury opened 5 years ago
Illegal access to address (@shared)0xdc0 detected.
Thread 1 "RayTraceDicom" received signal CUDA_EXCEPTION_1, Lane Illegal Address.
CUDA Exception: Warp Out-of-range Address
Thread 1 "RayTraceDicom" received signal CUDA_EXCEPTION_5, Warp Out-of-range Address.
[Switching focus to CUDA kernel 0, grid 17, block (29,0,0), thread (0,7,0), device 0, sm 0, warp 15, lane 0]
0x00000000011a3450 in void kernelSuperposition<6>(float const*, float const*, float*, int, int2*, int, int*)<<<(182,1,1),(32,8,1)>>> ()
These are the constants:
const unsigned int maxSuperpR = 32; ///< Largest superposition radius in pixels
const int superpTileX = 32; ///< Must be equal to warp size!
const int superpTileY = 8; ///< Desktop and laptop
const int minTilesInBatch = 16; ///< Minimum number of tiles in each KS batch
I use a Tesla K40
its weird, I just found out that if you just do
if(((row+i)*(superpTileX+2*rad) + threadIdx.x+j)>=(superpTileX+2*rad)*(superpTileY+2*rad)) printf("Dummy %d, f=%f\n", threadIdx.x, i);///<@todo fix this bug
then it runs through
Valgrind also detects this bug, using the racecheck tool. This is the report, apparently a race between Read and Write on same address.
/usr/local/cuda/bin/cuda-memcheck --tool racecheck --leak-check full ./RayTraceDicom
========= WARN: Race reported between Write access at 0x00001170 in RayTraceDicom/kernel_wrapper.cuh:437:void kernelSuperposition<int=1>(float const *, float const *, float*, int, int2*, int, int*)
========= and Read access at 0x00001138 in RayTraceDicom/kernel_wrapper.cuh:437:void kernelSuperposition<int=1>(float const *, float const *, float*, int, int2*, int, int*) [249984 hazards]
=========
========= Internal Memcheck Error: Detected racecheck error buffer overflow. Some records have been dropped.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x21ae68]
========= Host Frame:/lib/x86_64-linux-gnu/libpthread.so.0 [0x76db]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (clone + 0x3f) [0x12188f]
=========
========= WARN: Race reported between Write access at 0x00001170 in RayTraceDicom/kernel_wrapper.cuh:437:void kernelSuperposition<int=2>(float const *, float const *, float*, int, int2*, int, int*)
========= and Read access at 0x00001138 in RayTraceDicom/kernel_wrapper.cuh:437:void kernelSuperposition<int=2>(float const *, float const *, float*, int, int2*, int, int*) [436883 hazards]
=========
double free or corruption (out)
========= Error: process didn't terminate successfully
========= No CUDA-MEMCHECK results found
https://github.com/ferdymercury/RayTraceDicom/commit/5c7b54d64b6e0e4146188609b57105484dbd15b8