Running C-Lisp code on Nvidia GPUs

GlowingScrewdriver commented 4 months ago

This PR introduces a small framework for compiling and launching kernels written in C-Lisp on an Nvidia GPU

Documentation on this issue can be found here: https://outline.von-neumann.ai/s/d0cd5eb9-2e15-4fa4-bd17-c3911f305008/doc/attacking-ptx-kKZJzn5Uem

GlowingScrewdriver commented 4 months ago

Address space markers have been implemented in C-Lisp. Test kernels in vecadd.sexp and matmul.sexp now specify address space 1 for pointers passed from the caller.

chsasank commented 4 months ago

Do we also need addrspace for alloc? It looks like so:

from docs:

‘alloca’ Instruction Syntax: = alloca [inalloca] [, ] [, align ] [, addrspace()] ; yields type addrspace(num)*:result The ‘alloca’ instruction allocates memory on the stack frame of the currently executing function, to be automatically released when this function returns to its caller. If the address space is not explicitly specified, the object is allocated in the alloca address space from the datalayout string.

But llvmlite doesn't seem to support it. May be we don't need to implement at this point of time.

GlowingScrewdriver commented 4 months ago

Do we also need addrspace for alloc? It looks like so:

I don't think we need it as of now. Using alloca in the kernel without specifying addrspace works, and yet doesn't cause llc to insert address space casts.

On the other hand, I got an error from the CUDA driver API when I specified addrspace on a stack allocation. This is what I tried:

Generate vecadd.ll using make vecadd.ll
Add addrspace(1) to one of the alloca instructions, e.g. %"tmp_clisp-ckravzof" = alloca i32 -> %"tmp_clisp-ckravzof" = alloca i32, addrspace(1)
Correspondingly tweak the instructions that access this pointer e.g. store i32 %"tmp_clisp-ckravzof.1", i32* %"tmp_clisp-ckravzof" -> store i32 %"tmp_clisp-ckravzof.1", i32 addrspace(1)* %"tmp_clisp-ckravzof"
Run the tweaked kernel using make vecadd.run

The error reported by the driver: cuCtxSynchronize() returned non-zero status 717. Result code 717 (CUDA_ERROR_INVALID_ADDRESS_SPACE in CUDA terms) indicates that an instruction was used on data in the wrong address space, as per Nvidia documentation

chsasank / llama.lisp

Running C-Lisp code on Nvidia GPUs #71