exaloop / codon

A high-performance, zero-overhead, extensible Python compiler using LLVM
https://docs.exaloop.io/codon
Other
15.01k stars 517 forks source link

Trying to run nvidia gpu example failed to find libdevice.10.bc #400

Open kwmartin opened 1 year ago

kwmartin commented 1 year ago

codon run gpuEx.py libdevice.10.bc: error: Could not open input file: No such file or directory where gpuEx.py is:


import gpu

@gpu.kernel def hello(a, b, c): i = gpu.thread.x c[i] = a[i] + b[i]

a = [i for i in range(16)] b = [2*i for i in range(16)] c = [0 for _ in range(16)]

hello(a, b, c, grid=1, block=16) print(c)


OS; Ubuntu 22.04
where libdevice.10.bc is in 
> ls /usr/lib/cuda/nvvm/libdevice
.  ..  libdevice.10.bc
I also tried adding /usr/lib/cuda/nvvm/libdevice to /etc/ld.so.conf and ran sudo ldconf  and verified directory was in ldconf path
using:
ldconfig -v | g nvvm
...
/usr/lib/cuda/nvvm/libdevice: (from /etc/ld.so.conf:2)

I also tried putting a symbolic link in /usr/lib/
> ll /usr/lib/libdevice.10.bc
lrwxrwxrwx 1 root root 44 Jun  6 21:14 /usr/lib/libdevice.10.bc -> /usr/lib/cuda/nvvm/libdevice/libdevice.10.bc
> 
None of the above was successful. Any ideas?
arshajii commented 1 year ago

Hi @kwmartin -- can you try passing the path to libdevice.10.bc through the -libdevice flag (i.e. -libdevice /path/to/libdevice.10.bc -- https://docs.exaloop.io/codon/advanced/gpu#libdevice)? Let me know if that works.

kwmartin commented 1 year ago

Unfortunately no:

codon run -libdevice=/usr/lib/cuda/nvvm/libdevice/libdevice.10.bc tst.codon
JIT session error: Symbols not found: [ seq_nvptx_memcpy_d2h, seq_nvptx_memcpy_h2d, seq_nvptx_device_free, seq_nvptx_function, seq_nvptx_invoke, seq_nvptx_load_module, seq_nvptx_device_alloc ]
Failure value returned from cantFail wrapped call
Failed to materialize symbols: { (main, { std.internal.types.range.range:std.internal.types.range.range.__iter__:1[std.internal.types.range.range].183.resume, std.internal.types.error.LookupError:std.internal.types.error.LookupError.__init__:3[std.internal.types.error.LookupError,str,str].513, ._lambda_136:0[int].271, std.gpu.Memory[byte]:std.gpu.Memory._free:0[std.gpu.Memory[byte]].631, int:int.__mul__:1[int,int].14, std.internal.types.error.IndexError:std.internal.types.err
...

tst.codon:

import gpu

MAX    = 1000  # maximum Mandelbrot iterations
N      = 4096  # width and height of image
pixels = [0 for _ in range(N * N)]

def scale(x, a, b):
    return a + (x/N)*(b - a)

@gpu.kernel
def mandelbrot(pixels):
    idx = (gpu.block.x * gpu.block.dim.x) + gpu.thread.x
    i, j = divmod(idx, N)
    c = complex(scale(j, -2.00, 0.47), scale(i, -1.12, 1.12))
    z = 0j
    iteration = 0

    while abs(z) <= 2 and iteration < MAX:
        z = z**2 + c
        iteration += 1

    pixels[idx] = int(255 * iteration/MAX)

mandelbrot(pixels, grid=(N*N)//1024, block=1024)