NVIDIA / cuda-python

CUDA Python Low-level Bindings
https://nvidia.github.io/cuda-python/
Other
809 stars 63 forks source link

cross compilation failed #63

Closed ichergui closed 1 month ago

ichergui commented 1 month ago

Hi Team,

I'm trying cross compiling cuda-python but it failed. I'm using this specific commit: https://github.com/NVIDIA/cuda-python/commit/2ae98f9338f9c13e777f6fc647637d8b87086a49

Kindly see below the logs:

| cuda/cudart.cpp: In function 'PyObject* __pyx_tp_new_4cuda_6cudart_cudaGraphNodeParams(PyTypeObject*, PyObject*, PyObject*)':
| cuda/cudart.cpp:360545:74: error: use of deleted function 'cudaGraphNodeParams::cudaGraphNodeParams()'
| 360545 |   new((void*)&(p->_cudaGraphNodeParams__val)) struct cudaGraphNodeParams();
|        |                                                                          ^
| In file included from cuda/cudart.cpp:1263:
| /media/m2/tegra-demo-distro/build-testdistro/tmp/work/armv8a_tegra234-oe4t-linux/python3-cuda/12.2.0/recipe-sysroot/usr/local/cuda-12.2/include/driver_types.h:3041:27: note: 'cudaGraphNodeParams::cudaGraphNodeParams()' is implicitly deleted because the default definition would be ill-formed:
|  3041 | struct __device_builtin__ cudaGraphNodeParams {
|       |                           ^~~~~~~~~~~~~~~~~~~
| /media/m2/tegra-demo-distro/build-testdistro/tmp/work/armv8a_tegra234-oe4t-linux/python3-cuda/12.2.0/recipe-sysroot/usr/local/cuda-12.2/include/driver_types.h:3047:56: error: union member 'cudaGraphNodeParams::<unnamed union>::kernel' with non-trivial 'cudaKernelNodeParamsV2::cudaKernelNodeParamsV2()'
|  3047 |         struct cudaKernelNodeParamsV2                  kernel;
|       |   

Any idea ? Thanks

jakirkham commented 1 month ago

Could you please also share...?

  1. CUDA Toolkit version used
  2. How CUDA Toolkit was installed
  3. What compilers were used and versions
  4. What OS
  5. Steps to reproduce
  6. Python version
ichergui commented 1 month ago

Thanks @jakirkham for your quick reply. 1- CUDA ToolKit version 12.2.0 2- I'm using Yocto recipes to install the CUDA dependency 3- I'm using GCC version 13.2.0 4- Python version 3.12.3 5- Linux 6- Steps to reproduce

$ clone https://github.com/NVIDIA/cuda-python.git
$ cd cuda-python
$ git checkout 2ae98f9338f9c13e777f6fc647637d8b87086a49
$ python3 setup.py bdist_wheel --verbose --dist-dir ./dist
jakirkham commented 1 month ago

Thanks Ilies! 🙏

Would you be able to try with CUDA-Python 12.4.0 and CTK 12.4.0?

Know there was an issue with the CUDA Graph Management API ( https://github.com/NVIDIA/cuda-python/issues/55 ), which wasn't fixed until CUDA-Python 12.4.0

Also could you please edit your comment above to include the Python version you are using?

ichergui commented 1 month ago

Is the CUDA-Python 12.4.0 compatible with CUDA ToolKit 12.2.0 ?

ichergui commented 1 month ago

@jakirkham I updated my previous comment

ichergui commented 1 month ago

@jakirkham I could reproduce the issue when building natively in the Jetson Orin AGX

compile options: '-Icuda -I/usr/include -I/usr/local/cuda-12.2//include -I/usr/include/python3.10 -c'
extra options: '-std=c++14 -fpermissive -Wno-deprecated-declarations -D _GLIBCXX_ASSERTIONS -fno-var-tracking-assignments -O3'
aarch64-linux-gnu-gcc: cuda/cudart.cpp
cuda/cudart.cpp: In function ‘PyObject* __pyx_tp_new_4cuda_6cudart_cudaGraphNodeParams(PyTypeObject*, PyObject*, PyObject*)’:
cuda/cudart.cpp:360560:74: error: use of deleted function ‘cudaGraphNodeParams::cudaGraphNodeParams()’
360560 |   new((void*)&(p->_cudaGraphNodeParams__val)) struct cudaGraphNodeParams();
       |                                                                          ^
In file included from cuda/cudart.cpp:1278:
/usr/local/cuda-12.2//include/driver_types.h:3041:27: note: ‘cudaGraphNodeParams::cudaGraphNodeParams()’ is implicitly deleted because the default definition would be ill-formed:
 3041 | struct __device_builtin__ cudaGraphNodeParams {
      |                           ^~~~~~~~~~~~~~~~~~~
/usr/local/cuda-12.2//include/driver_types.h:3047:56: error: union member ‘cudaGraphNodeParams::<unnamed union>::kernel’ with non-trivial ‘cudaKernelNodeParamsV2::cudaKernelNodeParamsV2()’
 3047 |         struct cudaKernelNodeParamsV2                  kernel;
      |                                                        ^~~~~~
error: Command "aarch64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -Icuda -I/usr/include -I/usr/local/cuda-12.2//include -I/usr/include/python3.10 -c cuda/cudart.cpp -o build/temp.linux-aarch64-3.10/cuda/cudart.o -std=c++14 -fpermissive -Wno-deprecated-declarations -D _GLIBCXX_ASSERTIONS -fno-var-tracking-assignments -O3" failed with exit status 1
nvidia@tegra-ubuntu:~/cuda-python$ 

1- CUDA ToolKit version 12.2.0 2- CUDA Toolkit installation via $ sudo apt install cuda-toolkit-12-2 3- GCC version 11.4.0 4- Python version 3.10.12 5- OS is Linux

$ clone https://github.com/NVIDIA/cuda-python.git
$ cd cuda-python
$ git checkout 2ae98f9338f9c13e777f6fc647637d8b87086a49
$ python3 setup.py bdist_wheel --verbose --dist-dir ./dist
jakirkham commented 1 month ago

Is the CUDA-Python 12.4.0 compatible with CUDA ToolKit 12.2.0 ?

No we will need CUDA Toolkit 12.4.0 as well

ichergui commented 1 month ago

Hi @jakirkham I tried to build the latest changes in cuda-python with CUDA Toolkit 12.4.0 It doesn't work. I think that the setup.cfg file is missing Kindly see below

nvidia@tegra-ubuntu:~/temporary/cuda-python$ export CUDA_PATH="/usr/local/cuda-12.4"
nvidia@tegra-ubuntu:~/temporary/cuda-python$ python3 setup.py bdist_wheel --verbose --dist-dir ./dist
Parsing headers in "['/usr/local/cuda-12.4/include']" (Caching False)
Parsing driver headers
Parsing runtime headers
Parsing nvrtc headers
Generating cuda/cudart.pyx.in
Generating cuda/cnvrtc.pxd.in
Generating cuda/nvrtc.pyx.in
Generating cuda/ccuda.pyx.in
Generating cuda/cuda.pxd.in
Generating cuda/ccuda.pxd.in
Generating cuda/ccudart.pxd.in
Generating cuda/cnvrtc.pyx.in
Generating cuda/ccudart.pyx.in
Generating cuda/nvrtc.pxd.in
Generating cuda/cudart.pxd.in
Generating cuda/cuda.pyx.in
Generating cuda/_cuda/cnvrtc.pxd.in
Generating cuda/_cuda/ccuda.pyx.in
Generating cuda/_cuda/ccuda.pxd.in
Generating cuda/_cuda/cnvrtc.pyx.in
Generating cuda/_lib/utils.pxd.in
Generating cuda/_lib/utils.pyx.in
Generating cuda/_lib/ccudart/utils.pxd.in
Generating cuda/_lib/ccudart/ccudart.pxd.in
Generating cuda/_lib/ccudart/ccudart.pyx.in
Generating cuda/_lib/ccudart/utils.pyx.in
Compiling cuda/_cuda/ccuda.pyx because it changed.
Compiling cuda/_cuda/cnvrtc.pyx because it changed.
[1/2] Cythonizing cuda/_cuda/ccuda.pyx
[2/2] Cythonizing cuda/_cuda/cnvrtc.pyx
Compiling cuda/_lib/utils.pyx because it changed.
[1/1] Cythonizing cuda/_lib/utils.pyx
Compiling cuda/_lib/ccudart/ccudart.pyx because it changed.
Compiling cuda/_lib/ccudart/utils.pyx because it changed.
[1/2] Cythonizing cuda/_lib/ccudart/ccudart.pyx
[2/2] Cythonizing cuda/_lib/ccudart/utils.pyx
Compiling cuda/ccuda.pyx because it changed.
Compiling cuda/ccudart.pyx because it changed.
Compiling cuda/cnvrtc.pyx because it changed.
Compiling cuda/cuda.pyx because it changed.
Compiling cuda/cudart.pyx because it changed.
Compiling cuda/nvrtc.pyx because it changed.
[1/6] Cythonizing cuda/ccuda.pyx
[2/6] Cythonizing cuda/ccudart.pyx
[3/6] Cythonizing cuda/cnvrtc.pyx
[4/6] Cythonizing cuda/cuda.pyx
[5/6] Cythonizing cuda/cudart.pyx
[6/6] Cythonizing cuda/nvrtc.pyx
Compiling cuda/tests/test_ccuda.pyx because it changed.
Compiling cuda/tests/test_ccudart.pyx because it changed.
Compiling cuda/tests/test_interoperability_cython.pyx because it changed.
[1/3] Cythonizing cuda/tests/test_ccuda.pyx
[2/3] Cythonizing cuda/tests/test_ccudart.pyx
[3/3] Cythonizing cuda/tests/test_interoperability_cython.pyx
Traceback (most recent call last):
  File "/home/nvidia/temporary/cuda-python/setup.py", line 242, in <module>
    version=versioneer.get_version(),
  File "<string>", line 1871, in get_version
  File "<string>", line 1803, in get_versions
  File "<string>", line 414, in get_config_from_root
FileNotFoundError: [Errno 2] No such file or directory: '/home/nvidia/temporary/cuda-python/setup.cfg'
nvidia@tegra-ubuntu:~/temporary/cuda-python$
ichergui commented 1 month ago

I will fix this issue. I will send a PR shortly

leofang commented 1 month ago

setup.cfg should not be needed since this project already supports newer (PEP 517) build backend. What's your python, pip, and setuptools versions?

If your goal is to build a wheel, could you please do pip wheel -v . instead of invoking setup.py directly (which is considered a deprecated/bad practice now)? This should unblock you without any changes in cuda-python.

ichergui commented 1 month ago

Thanks @leofang for your reply. I will check that.

Are you planning to back port the fix mentioned by @jakirkham from 12.4.0 to 12.2.0 ? This is a blocker for OE (OpenEmbedded) world because we are using the same CUDA toolkit version as JetPack 6.0 / L4T R36.3.0 GA

leofang commented 1 month ago

Know there was an issue with the CUDA Graph Management API ( #55 ), which wasn't fixed until CUDA-Python 12.4.0

Are you planning to back port the fix mentioned by @jakirkham from 12.4.0 to 12.2.0 ?

@vzhurba01 can correct me, but conditional graphs (which was what #55 was for) aren't supported until CUDA 12.4, so I don't know what's there to backport.

Beginning in CUDA 12.4, CUDA Graphs supports conditional nodes, which enable the conditional or repeated execution of portions of a graph without returning control to the CPU. This frees up CPU resources, enabling many more workflows to be represented in a single graph.

(from https://developer.nvidia.com/blog/dynamic-control-flow-in-cuda-graphs-with-conditional-nodes/)

leofang commented 1 month ago

Also in case it's useful to solve your deployment needs: IIRC CUDA Python is designed with minor version compatibility in mind, so the latest CUDA Python should still work with older driver/toolkit from within the same major version. In theory we don't need backports.

ichergui commented 1 month ago

I can try that and will let you know Thanks @leofang

leofang commented 1 month ago

@ichergui how did it go?

ichergui commented 1 month ago

I was working on other stuff this morning. I will try that later today and will share the progress Thanks @leofang

ichergui commented 1 month ago

Hi @leofang

I tried to build the cuda-python version 12.4.0 with CUDA toolkit 12.2.0 I got the following issue

nvidia@tegra-ubuntu:~/cuda-python$ export CUDA_HOME="/usr/local/cuda-12.2/"
nvidia@tegra-ubuntu:~/cuda-python$ pip3 wheel -v .
Processing /home/nvidia/cuda-python
  Running command pip subprocess to install build dependencies
  Collecting setuptools
    Using cached setuptools-69.5.1-py3-none-any.whl (894 kB)
  Collecting versioneer[toml]==0.29
    Using cached versioneer-0.29-py3-none-any.whl (46 kB)
  Collecting cython
    Using cached Cython-3.0.10-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.5 MB)
  Collecting pyclibrary
    Using cached pyclibrary-0.2.2-py3-none-any.whl (560 kB)
  Collecting tomli
    Using cached tomli-2.0.1-py3-none-any.whl (12 kB)
  Collecting pyparsing<4,>=2.3.1
    Using cached pyparsing-3.1.2-py3-none-any.whl (103 kB)
  Installing collected packages: versioneer, tomli, setuptools, pyparsing, cython, pyclibrary
  Successfully installed cython-3.0.10 pyclibrary-0.2.2 pyparsing-3.1.2 setuptools-69.5.1 tomli-2.0.1 versioneer-0.29
  Installing build dependencies ... done
  Running command Getting requirements to build wheel
  Parsing headers in "['/usr/local/cuda-12.2/include']" (Caching False)
  Parsing driver headers
  Parsing runtime headers
  Parsing nvrtc headers
  Generating cuda/nvrtc.pxd.in
  Generating cuda/cuda.pyx.in
  Generating cuda/nvrtc.pyx.in
  Generating cuda/cudart.pyx.in
  Generating cuda/cuda.pxd.in
  Generating cuda/cudart.pxd.in
  Generating cuda/ccudart.pxd.in
  Generating cuda/cnvrtc.pyx.in
  Generating cuda/ccuda.pxd.in
  Generating cuda/ccudart.pyx.in
  Generating cuda/ccuda.pyx.in
  Generating cuda/cnvrtc.pxd.in
  Generating cuda/_cuda/cnvrtc.pyx.in
  Generating cuda/_cuda/ccuda.pxd.in
  Generating cuda/_cuda/ccuda.pyx.in
  Generating cuda/_cuda/cnvrtc.pxd.in
  Generating cuda/_lib/utils.pxd.in
  Generating cuda/_lib/utils.pyx.in
  Generating cuda/_lib/ccudart/utils.pxd.in
  Generating cuda/_lib/ccudart/utils.pyx.in
  Generating cuda/_lib/ccudart/ccudart.pxd.in
  Generating cuda/_lib/ccudart/ccudart.pyx.in
  Compiling cuda/_cuda/ccuda.pyx because it changed.
  Compiling cuda/_cuda/cnvrtc.pyx because it changed.
  [1/2] Cythonizing cuda/_cuda/ccuda.pyx
  [2/2] Cythonizing cuda/_cuda/cnvrtc.pyx

  Error compiling Cython file:
  ------------------------------------------------------------
  ...
      getPtr()
          Get memory address of class instance

      """
      cdef ccuda.CUlaunchAttributeValue_union* _ptr
      cdef CUgraphDeviceNode _devNode
           ^
  ------------------------------------------------------------

  cuda/cuda.pxd:1037:9: 'CUgraphDeviceNode' is not a type identifier

  Error compiling Cython file:
  ------------------------------------------------------------
  ...
      cdef CUDA_EXT_SEM_SIGNAL_NODE_PARAMS_v2 _extSemSignal
      cdef CUDA_EXT_SEM_WAIT_NODE_PARAMS_v2 _extSemWait
      cdef CUDA_MEM_ALLOC_NODE_PARAMS_v2 _alloc
      cdef CUDA_MEM_FREE_NODE_PARAMS _free
      cdef CUDA_BATCH_MEM_OP_NODE_PARAMS_v2 _memOp
      cdef CUDA_CONDITIONAL_NODE_PARAMS _conditional
           ^
  ------------------------------------------------------------

  cuda/cuda.pxd:2924:9: 'CUDA_CONDITIONAL_NODE_PARAMS' is not a type identifier

  Error compiling Cython file:
  ------------------------------------------------------------
  ...

      # Return values
      cdef int _int
      cdef void* _handle
      cdef unsigned int _d3dkmt_handle
      cdef cuda.CUmemFabricHandle _mem_fabric_handle
           ^
  ------------------------------------------------------------

  cuda/_lib/utils.pxd:97:9: 'CUmemFabricHandle' is not a type identifier
  Compiling cuda/_lib/utils.pyx because it changed.
  [1/1] Cythonizing cuda/_lib/utils.pyx
  Traceback (most recent call last):
    File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in <module>
      main()
    File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 130, in get_requires_for_build_wheel
      return hook(config_settings)
    File "/usr/lib/python3/dist-packages/setuptools/build_meta.py", line 162, in get_requires_for_build_wheel
      return self._get_build_requires(
    File "/usr/lib/python3/dist-packages/setuptools/build_meta.py", line 143, in _get_build_requires
      self.run_setup()
    File "/usr/lib/python3/dist-packages/setuptools/build_meta.py", line 158, in run_setup
      exec(compile(code, __file__, 'exec'), locals())
    File "setup.py", line 217, in <module>
      extensions += do_cythonize(sources)
    File "setup.py", line 186, in do_cythonize
      return cythonize(
    File "/tmp/pip-build-env-qez9ebej/overlay/local/lib/python3.10/dist-packages/Cython/Build/Dependencies.py", line 1154, in cythonize
      cythonize_one(*args)
    File "/tmp/pip-build-env-qez9ebej/overlay/local/lib/python3.10/dist-packages/Cython/Build/Dependencies.py", line 1321, in cythonize_one
      raise CompileError(None, pyx_file)
  Cython.Compiler.Errors.CompileError: cuda/_lib/utils.pyx
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /usr/bin/python3 /usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py get_requires_for_build_wheel /tmp/tmpnvog9v84
  cwd: /home/nvidia/cuda-python
  Getting requirements to build wheel ... error
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
nvidia@tegra-ubuntu:~/cuda-python$ 
leofang commented 1 month ago

@ichergui could you try:

ichergui commented 1 month ago

@leofang Is it possible to build cuda-python version 12.4.0 with CT 12.2.0

jakirkham commented 1 month ago

No it is not

jakirkham commented 1 month ago

Could you please let us know if there is some reason updating the CUDA Toolkit is not an option?

ichergui commented 1 month ago

Hi @jakirkham I want to keep alignment with JetPack 6.0 / L4T R36.3.0 GA which provides CUDA Toolkit version 12.2.0 Could you please help to fix the conditional graphs and make the build of cuda-python doable with CUDA Toolkit version 12.2.0 ?

Thank you

ichergui commented 1 month ago

Thanks so much @jakirkham @leofang I was able to build from source cuda-python (tag: v12.2.1 ) and run some tests successfully

root@jetson-agx-orin-devkit-industrial:~# python3
Python 3.12.3 (main, Apr  9 2024, 08:09:14) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from cuda import cudart
>>> from cuda import cuda
>>> cuda_init_result, = cuda.cuInit(0)
>>> print(cuda_init_result)
0
>>> device_count_result, num_devices = cuda.cuDeviceGetCount()
>>> print(device_count_result)
0
>>> print(num_devices)
1
>>> property_result, properties = cudart.cudaGetDeviceProperties(0)
>>> print("Is it Integrated GPU? :", properties.integrated)
Is it Integrated GPU? : 1
>>> ^D
root@jetson-agx-orin-devkit-industrial:~#