NVIDIA / cuda-python

CUDA Python Low-level Bindings
https://nvidia.github.io/cuda-python/
Other
809 stars 63 forks source link

Cannot use CUDA Python to interact with external C/C++ code #22

Closed shwina closed 2 years ago

shwina commented 2 years ago

The problem

The Cython example below demonstrates an attempt to use CUDA Python to interact with some external C++ code. Note that the "external code" is included inline in the Cython.

# distutils: language=c++
# distutils: extra_compile_args=-I/usr/local/cuda/include/

from cuda.ccudart cimport cudaMemAllocationHandleType

cdef extern from *:
    """
    #include <cuda_runtime_api.h>

    void foo(cudaMemAllocationHandleType x) {
        return;
    }
    """
    void foo(cudaMemAllocationHandleType x)

foo(cudaMemAllocationHandleType.cudaMemHandleTypeNone)

The external code is a function foo that accepts a cudaMemAllocationHandleType. We attempt to invoke that function from Cython by passing in a cuda.ccudart.cudaMemAlloccationHandleType, but this fails with an error like:

error: cannot convert '__pyx_t_4cuda_7ccudart_cudaMemAllocationHandleType' to 'cudaMemAllocationHandleType'
 4857 |   foo(__pyx_e_4cuda_7ccudart_cudaMemHandleTypeNone);

To reproduce the problem, save the example above to a flle foo.pyx, then run cythonize -i foo.pyx.

Why this happens

This is because the function foo expects a cudaMemAllocationHandleType that is defined in the CUDA runtime library. But CUDA Python "rewrites" the runtime library at the Cython layer, and has its own cudaMemAllocationHandleType (which ends up with a mangled name when transpiled from Cython to C++). The two are not interchangeable.

A potential solution

A potential solution, proposed by @leofang in an offline discussion, is to use extern declarations for types in ccudart.pxd, rather than to redefine them. For example:

diff --git a/cuda/ccudart.pxd b/cuda/ccudart.pxd
index 57e1e96..6c0b5d4 100644
--- a/cuda/ccudart.pxd
+++ b/cuda/ccudart.pxd
@@ -678,11 +678,12 @@ cdef enum cudaMemAllocationType:
     cudaMemAllocationTypePinned = 1
     cudaMemAllocationTypeMax = 2147483647

-cdef enum cudaMemAllocationHandleType:
-    cudaMemHandleTypeNone = 0
-    cudaMemHandleTypePosixFileDescriptor = 1
-    cudaMemHandleTypeWin32 = 2
-    cudaMemHandleTypeWin32Kmt = 4
+cdef extern from 'driver_types.h':
+    ctypedef enum cudaMemAllocationHandleType 'cudaMemAllocationHandleType':
+        cudaMemHandleTypeNone = 0
+        cudaMemHandleTypePosixFileDescriptor = 1
+        cudaMemHandleTypeWin32 = 2
+        cudaMemHandleTypeWin32Kmt = 4

 cdef struct cudaMemPoolProps:
     cudaMemAllocationType allocType
diff --git a/setup.py b/setup.py
index 394166e..16fad9f 100644
--- a/setup.py
+++ b/setup.py
@@ -30,6 +30,7 @@ except Exception:

 include_dirs = [
     os.path.dirname(sysconfig.get_path("include")),
+    '/usr/local/cuda-11.4/include',
 ]

 library_dirs = [get_python_lib(), os.path.join(os.sys.prefix, "lib")]

Gotcha

Currently, we ship a single version of CUDA Python that is built with the latest CUDA toolkit, and we expect it to work for older minor versions of the CUDA toolkit by leveraging CUDA enhanced compatibility.

Historically, there have been cases when the runtime API has changed across minor versions of the CUDA toolkit. In particular, the names/ordering of enum members have changed between minor versions. For example, in CUDA 10.1, there was a typo in the enum member cudaErrorDeviceUninitilialized that was fixed in 10.2.

It's not clear how we would handle the situation if something like that were to happen again. In the example above, we would have to have separate extern declarations for 10.1 and 10.2 somehow.

jakirkham commented 2 years ago

cc @vzhurba01

vzhurba01 commented 2 years ago

Release v11.7.1 resolves the issue as per suggestion. Thank you!