NVIDIA / numba-cuda

BSD 2-Clause "Simplified" License
24 stars 7 forks source link

Add NRT c++ functions #17

Open brandon-b-miller opened 1 month ago

brandon-b-miller commented 1 month ago

This PR adds device side implementations of some of the NRT c-api. Combined with the change to the CUDATargetContext object, this allows launching of kernels like this:

from numba import cuda
import numpy as np

@cuda.jit
def f(x):
    return x[:5]

@cuda.jit('void()', link=['nrt.cu'])
def g():
    x = cuda.shared.array(10, dtype=np.int32)
    f(x)

g[1,1]()

Several things are as of the time of writing missing:

gmarkall commented 1 month ago

Some additional changes needed for adding the nrt test module: https://github.com/gmarkall/numba-cuda/commit/9bb662d7ca8e5319bc1507f1aa01133cad24c532

gmarkall commented 1 month ago

I tried to run the test but I had to work around a couple of other issues:

At that point, running the test with:

$ python ../numba/runtests.py numba.cuda.tests.nrt.test_nrt

is giving me:

numba.cuda.cudadrv.driver.LinkerError: [300] Call to cuLinkComplete results in CUDA_ERROR_INVALID_SOURCE
error   : Undefined reference to 'NRT_MemInfo_data_fast' in '<cudapy-ptx>'
error   : Undefined reference to 'NRT_MemInfo_alloc_aligned' in '<cudapy-ptx>'
gmarkall commented 1 month ago

WIth the changes from my branch above, and syncing before the original example exits:

from numba import cuda
import numpy as np

@cuda.jit
def f(x):
    return x[:5]

@cuda.jit
def g():
    x = cuda.shared.array(10, dtype=np.int32)
    f(x)

g[1,1]()
cuda.synchronize()

we get:

numba.cuda.cudadrv.driver.CudaAPIError: [700] Call to cuCtxSynchronize results in UNKNOWN_CUDA_ERROR

I presume this also happens for you, but without the synchronize at the end, Python exits before the error is detected?

gmarkall commented 1 month ago

I see, it looks like the meminfo is null:

========= Invalid __global__ atomic of size 8 bytes
=========     at NRT_incref+0xb0
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x0 is out of bounds
=========     and is 8,703,180,800 bytes before the nearest allocation at 0x206c00000 of size 8,388,864 bytes
=========     Device Frame:cudapy::__main__::g[abi:v1,cw51cXTLSUwv1sCUt9Ww0FGw1NRRwGKFLSgtAWVgaQPKRwoMW3A4A4MbLeahKhGJDEkANeHAJJA8YlutDgtedSsFaDSpAyWBPONaTQA_3d]+0x40 in /home/gmarkall/numbadev/issues/numba-cuda-17/original_repro.py:10
copy-pr-bot[bot] commented 1 month ago

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

gmarkall commented 1 month ago

/ok to test

gmarkall commented 2 weeks ago

From CI:

numba.cuda.cudadrv.error.NvrtcError: NVRTC Compilation failure whilst compiling nrt.cu:

nrt.cu(4): catastrophic error: cannot open source file "cuda/atomic"

I wonder if there are additional CUDA toolkit packages on which this introduces a depdencdency. Perhaps we need to install more in the CI environment (or perhaps the NVRTC include path is just set incorrectly).

brandon-b-miller commented 1 week ago

From CI:

numba.cuda.cudadrv.error.NvrtcError: NVRTC Compilation failure whilst compiling nrt.cu:

nrt.cu(4): catastrophic error: cannot open source file "cuda/atomic"

I wonder if there are additional CUDA toolkit packages on which this introduces a depdencdency. Perhaps we need to install more in the CI environment (or perhaps the NVRTC include path is just set incorrectly).

I think cuda/atomic is meant to come in from libcudacxx. I'm not sure numba has this dependency yet. I will follow up here

gmarkall commented 1 week ago

I think cuda/atomic is meant to come in from libcudacxx. I'm not sure numba has this dependency yet. I will follow up here

I think we need to install / depend on the cuda-cccl package.

gmarkall commented 1 week ago

I think Numba is still not finding the headers because it's not necessarily looking in the right place for them - we have an environment variable to set the CUDA include path, but we probably need to find a way for Numba to find these headers when they're installed elsewhere.

brandon-b-miller commented 2 days ago

I think Numba is still not finding the headers because it's not necessarily looking in the right place for them - we have an environment variable to set the CUDA include path, but we probably need to find a way for Numba to find these headers when they're installed elsewhere.

Looks like when the cccl package is installed, there's a number of places the files can end up. I have a hunch that the real way of fixing this is to rework things in terms of cmake so that cccl (as well as other pieces of cuda that are relied on) are discoverable in any environment seamlessly. Then we can embed the paths from cmake into a file and numba can pick them up from there. I will seek some feedback on this idea.

Without that, we might be stuck with a separate environment variable in the short term.