NVIDIA / numba-cuda

BSD 2-Clause "Simplified" License
36 stars 8 forks source link

[FEA] Support automatic generation of `link` list for external code #67

Open gmarkall opened 2 weeks ago

gmarkall commented 2 weeks ago

Numba-cuda extensions (e.g. nvmath-python) are frequently leaning on CUDA C++ implementations to support the core of their functionality.

One current UX limitation is that the kernel author is required to add the list of files and/or code to link with a kernel as a keyword argument to the @cuda.jit decorator, for example:

@cuda.jit(link=FFT.files)
def f(data):
    ...

from cufftdx_simple_fft_block.py

The FFT object supplies the files, and is created like:

FFT = fft(fft_type='c2c', 
          size=128,
          precision=np.float32,
          direction='forward',
          elements_per_thread=8,
          ffts_per_block=2,
          execution='Block',
          compiler='numba')

and is called inside the kernel as:

FFT(thread_data, shared_mem)

Rather than the user being required to link FFT.files, Numba should provide a mechanism to obtain and link the list of files / code (LTO-IR, PTX, CUDA C/C++ source, or binaries / objects etc.) at the point of compilation and linking from the FFT object (or any implementation of a method, property, object, etc. backed by an extension). It is expected that the implementation (of FFT, in this example) may generate code (e.g. LTO-IR) at this point just prior to returning it back to Numba.

leofang commented 2 weeks ago

Perhaps ask all device functions to implement a method attribute, say, __numba_cuda_link__ that returns a list of files, if they want numba-cuda to handle the linking?

gmarkall commented 2 weeks ago

Continuing with the __numba_cuda_link__ idea, I think it might need to be a method that can accept a signature, so that it can return the appropriate files for the given signature.

leofang commented 2 weeks ago

How should the kernel author pass function arguments at the call site if it is a method not attribute?

gmarkall commented 2 weeks ago

In the example above, the kernel author wrote:

FFT(thread_data, shared_mem)

assuming the Numba types of these are float32[:] and float32[::1] (for the sake of argument, they could be any Numba type really) I'd expect during compilation time that Numba would be doing the equivalent of calling

ltoir = FFT.__numba_cuda_link__(float32[:], float32[::1])

where ltoir is then an LTOIR linkable code object, i.e. an instance of

https://github.com/NVIDIA/numba-cuda/blob/e5f2ef936a6155712020f79e1b39b28f5d41e9ce/numba_cuda/numba/cuda/cudadrv/linkable_code.py#L59-L63