I noticed that with your current example Makefile that Kernel launch failures will occur when the example kernel is not compiled with a machine type consistent with Julia's Cuint.
A very basic fix is to ensure that line 3 in arrays.jl:
typealias CUdeviceptr Uint64
And in the examples/makefile:
nvcc -ptx -m64 vadd.cu
For example:
Observing Cudeviceptr in cualloc via:
function cualloc(T::Type, len::Integer)
a = CUdeviceptr[0]
nbytes = int(len) * sizeof(T)
@cucall(:cuMemAlloc, (Ptr{CUdeviceptr}, Csize_t), a, nbytes)
println(CuPtr(a[1]))
return CuPtr(a[1])
end
on my machine prints: CuPtr(0x204c0400) a 32bit address
whereas running ex1.jl with cuda-memcheck we see:
cuda-memcheck julia ex1.jl
...
...
...
========= Invalid __global__ read of size 4
========= at 0x00000068 in vadd
========= by thread (99,0,0) in block (0,0,0)
========= Address 0x7f8e204c018c is out of bounds
...
...
...
CuDriverError(700): Kernel launch failed
The address shown is 64bit. Over multiple runs its clear that lower 32bits of Address are correct and the higher 32bits are undefined.
I noticed that with your current example Makefile that Kernel launch failures will occur when the example kernel is not compiled with a machine type consistent with Julia's Cuint.
A very basic fix is to ensure that line 3 in arrays.jl:
And in the examples/makefile:
For example: Observing Cudeviceptr in cualloc via:
on my machine prints: CuPtr(0x204c0400) a 32bit address
whereas running ex1.jl with cuda-memcheck we see:
The address shown is 64bit. Over multiple runs its clear that lower 32bits of Address are correct and the higher 32bits are undefined.