ferdymercury / RayTraceDicom

Sub-second pencil beam dose calculation on GPU for adaptive proton therapy
GNU General Public License v3.0
6 stars 8 forks source link

Fix index out of array #2

Open ferdymercury opened 5 years ago

ferdymercury commented 5 years ago

https://github.com/ferdymercury/RayTraceDicom/commit/5c7b54d64b6e0e4146188609b57105484dbd15b8

ferdymercury commented 5 years ago
Illegal access to address (@shared)0xdc0 detected.
Thread 1 "RayTraceDicom" received signal CUDA_EXCEPTION_1, Lane Illegal Address.
CUDA Exception: Warp Out-of-range Address
Thread 1 "RayTraceDicom" received signal CUDA_EXCEPTION_5, Warp Out-of-range Address.
[Switching focus to CUDA kernel 0, grid 17, block (29,0,0), thread (0,7,0), device 0, sm 0, warp 15, lane 0]
0x00000000011a3450 in void kernelSuperposition<6>(float const*, float const*, float*, int, int2*, int, int*)<<<(182,1,1),(32,8,1)>>> ()

These are the constants:

const unsigned int maxSuperpR = 32; ///< Largest superposition radius in pixels
const int superpTileX = 32;         ///< Must be equal to warp size!
const int superpTileY = 8;          ///< Desktop and laptop
const int minTilesInBatch = 16;     ///< Minimum number of tiles in each KS batch

I use a Tesla K40

its weird, I just found out that if you just do if(((row+i)*(superpTileX+2*rad) + threadIdx.x+j)>=(superpTileX+2*rad)*(superpTileY+2*rad)) printf("Dummy %d, f=%f\n", threadIdx.x, i);///<@todo fix this bug then it runs through

ferdymercury commented 5 years ago

Valgrind also detects this bug, using the racecheck tool. This is the report, apparently a race between Read and Write on same address.

/usr/local/cuda/bin/cuda-memcheck --tool racecheck  --leak-check full ./RayTraceDicom
========= WARN: Race reported between Write access at 0x00001170 in RayTraceDicom/kernel_wrapper.cuh:437:void kernelSuperposition<int=1>(float const *, float const *, float*, int, int2*, int, int*)
=========     and Read access at 0x00001138 in RayTraceDicom/kernel_wrapper.cuh:437:void kernelSuperposition<int=1>(float const *, float const *, float*, int, int2*, int, int*) [249984 hazards]
=========
========= Internal Memcheck Error: Detected racecheck error buffer overflow. Some records have been dropped.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x21ae68]
=========     Host Frame:/lib/x86_64-linux-gnu/libpthread.so.0 [0x76db]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (clone + 0x3f) [0x12188f]
=========
========= WARN: Race reported between Write access at 0x00001170 in RayTraceDicom/kernel_wrapper.cuh:437:void kernelSuperposition<int=2>(float const *, float const *, float*, int, int2*, int, int*)
=========     and Read access at 0x00001138 in RayTraceDicom/kernel_wrapper.cuh:437:void kernelSuperposition<int=2>(float const *, float const *, float*, int, int2*, int, int*) [436883 hazards]
=========
double free or corruption (out)
========= Error: process didn't terminate successfully
========= No CUDA-MEMCHECK results found
ferdymercury commented 5 years ago

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#independent-thread-scheduling-7-x