Allow CUDA source inputs compiled to LTOIR, and enable pynvjitlinker to link inputs that contains LTOIR

isVoid commented 1 month ago

This PR supercedes #60 due to write permission issue.

isVoid commented 1 month ago

I'm not able to reproduce this segfault on my v100 machine:

test_namedunituple (numba.cuda.tests.cudapy.test_array_args.TestCudaArrayArg.test_namedunituple) ... Fatal Python error: Segmentation fault
...

Extension modules: numpy._core._multiarray_umath, numpy._core._multiarray_tests, numpy.linalg._umath_linalg, numba.core.typeconv._typeconv, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, numba.mviewbuf, pynvjitlink._nvjitlinklib, numba.types.itertools (total: 22)
ci/test_conda_pynvjitlink.sh: line 72:  2238 Segmentation fault      (core dumped) ENABLE_PYNVJITLINK=1 NUMBA_CUDA_TEST_BIN_DIR=$NUMBA_CUDA_TEST_BIN_DIR python -m numba.runtests numba.cuda.tests -v
/__w/numba-cuda/numba-cuda

isVoid commented 4 weeks ago

In afcce87 I added an additional flag ignore_nonlto to the linker so that only LTO-able objects are added when the flag is enabled. This allows the driver is in the correct state when -ptx flag is set. And this is also the desired behavior since Numba now dumps the optimized PTX only for the portion that are LTO-abled added to the linker, and raise warning for any source that aren't optimizable.

isVoid commented 4 weeks ago

A subtle case here is that lto=True is only enabled for cuda>12.0 and is tested so. Because this feature depends on pynvjitlink, which is only tested in CTK12.5 environment.

NVIDIA / numba-cuda

Allow CUDA source inputs compiled to LTOIR, and enable pynvjitlinker to link inputs that contains LTOIR #62