Kernel launch failure when CUdeviceptr and .ptx type are inconsistent

I noticed that with your current example Makefile that Kernel launch failures will occur when the example kernel is not compiled with a machine type consistent with Julia's Cuint.

A very basic fix is to ensure that line 3 in arrays.jl:

typealias CUdeviceptr Uint64

And in the examples/makefile:

nvcc -ptx -m64 vadd.cu

For example: Observing Cudeviceptr in cualloc via:

function cualloc(T::Type, len::Integer)
    a = CUdeviceptr[0]  
    nbytes = int(len) * sizeof(T)
    @cucall(:cuMemAlloc, (Ptr{CUdeviceptr}, Csize_t), a, nbytes)    
    println(CuPtr(a[1]))
    return CuPtr(a[1])
end

on my machine prints: CuPtr(0x204c0400) a 32bit address

whereas running ex1.jl with cuda-memcheck we see:

cuda-memcheck julia ex1.jl
...
...
...
========= Invalid __global__ read of size 4
=========     at 0x00000068 in vadd
=========     by thread (99,0,0) in block (0,0,0)
=========     Address 0x7f8e204c018c is out of bounds
...
...
...
CuDriverError(700): Kernel launch failed

The address shown is 64bit. Over multiple runs its clear that lower 32bits of Address are correct and the higher 32bits are undefined.

JuliaAttic / CUDA.jl

Kernel launch failure when CUdeviceptr and .ptx type are inconsistent #5