Open kwmartin opened 1 year ago
Hi @kwmartin -- can you try passing the path to libdevice.10.bc
through the -libdevice
flag (i.e. -libdevice /path/to/libdevice.10.bc
-- https://docs.exaloop.io/codon/advanced/gpu#libdevice)? Let me know if that works.
Unfortunately no:
codon run -libdevice=/usr/lib/cuda/nvvm/libdevice/libdevice.10.bc tst.codon
JIT session error: Symbols not found: [ seq_nvptx_memcpy_d2h, seq_nvptx_memcpy_h2d, seq_nvptx_device_free, seq_nvptx_function, seq_nvptx_invoke, seq_nvptx_load_module, seq_nvptx_device_alloc ]
Failure value returned from cantFail wrapped call
Failed to materialize symbols: { (main, { std.internal.types.range.range:std.internal.types.range.range.__iter__:1[std.internal.types.range.range].183.resume, std.internal.types.error.LookupError:std.internal.types.error.LookupError.__init__:3[std.internal.types.error.LookupError,str,str].513, ._lambda_136:0[int].271, std.gpu.Memory[byte]:std.gpu.Memory._free:0[std.gpu.Memory[byte]].631, int:int.__mul__:1[int,int].14, std.internal.types.error.IndexError:std.internal.types.err
...
tst.codon:
import gpu
MAX = 1000 # maximum Mandelbrot iterations
N = 4096 # width and height of image
pixels = [0 for _ in range(N * N)]
def scale(x, a, b):
return a + (x/N)*(b - a)
@gpu.kernel
def mandelbrot(pixels):
idx = (gpu.block.x * gpu.block.dim.x) + gpu.thread.x
i, j = divmod(idx, N)
c = complex(scale(j, -2.00, 0.47), scale(i, -1.12, 1.12))
z = 0j
iteration = 0
while abs(z) <= 2 and iteration < MAX:
z = z**2 + c
iteration += 1
pixels[idx] = int(255 * iteration/MAX)
mandelbrot(pixels, grid=(N*N)//1024, block=1024)
@gpu.kernel def hello(a, b, c): i = gpu.thread.x c[i] = a[i] + b[i]
a = [i for i in range(16)] b = [2*i for i in range(16)] c = [0 for _ in range(16)]
hello(a, b, c, grid=1, block=16) print(c)