IBM / aihwkit

IBM Analog Hardware Acceleration Kit
https://aihwkit.readthedocs.io
MIT License
364 stars 148 forks source link

How to configure CUDA during installation? #398

Closed matifali closed 2 years ago

matifali commented 2 years ago

How can I configure CUDA during installation? I have torch 1.12 installed with CUDA enabled. But when I install aihwkit using

pip install -v aihwkit

It wants to install torch=1.8 I am using Nvidia RTX A5000, which does not support torch=1.8.

Also, is this the only requirement or do I need to change something else to enable CUDA support?

maljoras commented 2 years ago

The package install does not support CUDA at the moment. For CUDA installation you need compile the code yourself. You can follow the development installation instructions . Which OS are you using? For windows you can try this

In case of linux, you might want to follow this and use

make build_inplace_cuda

which is a short form of

python setup.py build_ext -j8 -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE --inplace  -DRPU_BLAS=MKL -DINTEL_MKL_DIR=${MKLROOT} -DUSE_CUDA=ON
matifali commented 2 years ago

I am using the pytorch/pytorch:latest docker image that comes with pytorch=1.12 and python=3.7

CMake Error in /home/atif/aihwkit/_skbuild/linux-x86_64-3.7/cmake-build/CMakeFiles/CMakeTmp/CMakeLists.txt:
  CUDA_ARCHITECTURES is empty for target "cmTC_d872e".

I am getting these errors

maljoras commented 2 years ago

You need to set the flag CUDA_ARCHITECTURES correctly. For instance by:

make build_cuda flags='-DRPU_CUDA_ARCHITECTURES="86"'

Looks like the RTX A5000 has cuda_arch 86

matifali commented 2 years ago

Still same error CUDA_ARCHITECTURES is empty for target "cmTC_7572b"

maljoras commented 2 years ago

That looks like that your CUDA library might not be installed properly? Or maybe try

make build_cuda flags='-DRPU_CUDA_ARCHITECTURES="86" -DCUDA_ARCHITECTURES="86"'

If that is not working make sure that it can find nvcc (you might have to set CUDA_HOME to the path where the cuda compilers are installed) It would help if you posted the full error log.

matifali commented 2 years ago

This is the full log

(aihwkit) atif@ibmaihwkit:~/aihwkit$ make build_cuda flags='-DRPU_CUDA_ARCHITECTURES="86"  -DCUDA_ARCHITECTURES="86"'
make build_mkl flags="-DUSE_CUDA=ON -DRPU_CUDA_ARCHITECTURES="86"  -DCUDA_ARCHITECTURES="86""
make[1]: Entering directory '/home/atif/aihwkit'
make build flags="-DRPU_BLAS=MKL -DINTEL_MKL_DIR=  -DUSE_CUDA=ON -DRPU_CUDA_ARCHITECTURES=86  -DCUDA_ARCHITECTURES=86"
make[2]: Entering directory '/home/atif/aihwkit'
python setup.py install --user -j8 -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE  -DRPU_BLAS=MKL -DINTEL_MKL_DIR=  -DUSE_CUDA=ON -DRPU_CUDA_ARCHITECTURES=84  -DCUDA_ARCHITECTURES=86
/home/atif/.conda/envs/aihwkit/lib/python3.7/site-packages/setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  setuptools.SetuptoolsDeprecationWarning,

--------------------------------------------------------------------------------
-- Trying "Ninja" generator
--------------------------------
---------------------------
----------------------
-----------------
------------
-------
--
Not searching for unused variables given on the command line.
CMake Error: CMake was unable to find a build program corresponding to "Ninja".  CMAKE_MAKE_PROGRAM is not set.  You probably need to select a different build tool.
-- Configuring incomplete, errors occurred!
See also "/home/atif/aihwkit/_cmake_test_compile/build/CMakeFiles/CMakeOutput.log".
--
-------
------------
-----------------
----------------------
---------------------------
--------------------------------
-- Trying "Ninja" generator - failure
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
-- Trying "Unix Makefiles" generator
--------------------------------
---------------------------
----------------------
-----------------
------------
-------
--
Not searching for unused variables given on the command line.
-- The C compiler identification is GNU 7.5.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- The CXX compiler identification is GNU 7.5.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: /home/atif/aihwkit/_cmake_test_compile/build
--
-------
------------
-----------------
----------------------
---------------------------
--------------------------------
-- Trying "Unix Makefiles" generator - success
--------------------------------------------------------------------------------

Configuring Project
  Working directory:
    /home/atif/aihwkit/_skbuild/linux-x86_64-3.7/cmake-build
  Command:
    cmake /home/atif/aihwkit -G 'Unix Makefiles' -DCMAKE_INSTALL_PREFIX:PATH=/home/atif/aihwkit/_skbuild/linux-x86_64-3.7/cmake-install -DPYTHON_VERSION_STRING:STRING=3.7.13 -DSKBUILD:INTERNAL=TRUE -DCMAKE_MODULE_PATH:PATH=/home/atif/.conda/envs/aihwkit/lib/python3.7/site-packages/skbuild/resources/cmake -DPython3_EXECUTABLE:FILEPATH=/home/atif/.conda/envs/aihwkit/bin/python -DPython3_INCLUDE_DIR:PATH=/home/atif/.conda/envs/aihwkit/include/python3.7m -DPython3_LIBRARY:PATH=/home/atif/.conda/envs/aihwkit/lib/libpython3.7m.so -DPython3_NumPy_INCLUDE_DIRS:PATH=/home/atif/.conda/envs/aihwkit/lib/python3.7/site-packages/numpy/core/include -DPython_EXECUTABLE:FILEPATH=/home/atif/.conda/envs/aihwkit/bin/python -DPython_INCLUDE_DIR:PATH=/home/atif/.conda/envs/aihwkit/include/python3.7m -DPython_LIBRARY:PATH=/home/atif/.conda/envs/aihwkit/lib/libpython3.7m.so -DPython_NumPy_INCLUDE_DIRS:PATH=/home/atif/.conda/envs/aihwkit/lib/python3.7/site-packages/numpy/core/include -DPYTHON_EXECUTABLE:FILEPATH=/home/atif/.conda/envs/aihwkit/bin/python -DPYTHON_INCLUDE_DIR:PATH=/home/atif/.conda/envs/aihwkit/include/python3.7m -DPYTHON_LIBRARY:PATH=/home/atif/.conda/envs/aihwkit/lib/libpython3.7m.so -DPYTHON_NumPy_INCLUDE_DIRS:PATH=/home/atif/.conda/envs/aihwkit/lib/python3.7/site-packages/numpy/core/include -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DRPU_BLAS=MKL -DINTEL_MKL_DIR= -DUSE_CUDA=ON -DRPU_CUDA_ARCHITECTURES=84 -DCUDA_ARCHITECTURES=86 -DCMAKE_BUILD_TYPE:STRING=Release

-- The C compiler identification is GNU 7.5.0
-- The CXX compiler identification is GNU 7.5.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Invoking cmake through scikit-build
-- The BLAS backend of choice:MKL
-- MKL_THREADING = OMP
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of void*
-- Check size of void* - done
-- Checking for [mkl_intel_lp64 - mkl_gnu_thread - mkl_core - gomp - pthread - m - dl]
--   Library mkl_intel_lp64: /home/atif/.conda/envs/aihwkit/lib/libmkl_intel_lp64.so
--   Library mkl_gnu_thread: /home/atif/.conda/envs/aihwkit/lib/libmkl_gnu_thread.so
--   Library mkl_core: /home/atif/.conda/envs/aihwkit/lib/libmkl_core.so
--   Library gomp: -fopenmp
--   Library pthread: /usr/lib/x86_64-linux-gnu/libpthread.so
--   Library m: /usr/lib/x86_64-linux-gnu/libm.so
--   Library dl: /usr/lib/x86_64-linux-gnu/libdl.so
-- Looking for cblas_sgemm
-- Looking for cblas_sgemm - found
-- MKL library found
-- Performing Test C_HAS_AVX_1
-- Performing Test C_HAS_AVX_1 - Failed
-- Performing Test C_HAS_AVX_2
-- Performing Test C_HAS_AVX_2 - Success
-- Performing Test C_HAS_AVX2_1
-- Performing Test C_HAS_AVX2_1 - Failed
-- Performing Test C_HAS_AVX2_2
-- Performing Test C_HAS_AVX2_2 - Success
-- Performing Test CXX_HAS_AVX_1
-- Performing Test CXX_HAS_AVX_1 - Failed
-- Performing Test CXX_HAS_AVX_2
-- Performing Test CXX_HAS_AVX_2 - Success
-- Performing Test CXX_HAS_AVX2_1
-- Performing Test CXX_HAS_AVX2_1 - Failed
-- Performing Test CXX_HAS_AVX2_2
-- Performing Test CXX_HAS_AVX2_2 - Success
-- AVX compiler support found
-- MKL include for RPU is /home/atif/.conda/envs/aihwkit/lib/libmkl_intel_lp64.so;/home/atif/.conda/envs/aihwkit/lib/libmkl_gnu_thread.so;/home/atif/.conda/envs/aihwkit/lib/libmkl_core.so;-fopenmp;/usr/lib/x86_64-linux-gnu/libpthread.so;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libdl.so;/home/atif/.conda/envs/aihwkit/lib/libmkl_avx2.so.1
-- Found PythonInterp: /home/atif/.conda/envs/aihwkit/bin/python (found version "3.7.13") 
-- Found PythonLibs: /home/atif/.conda/envs/aihwkit/lib/libpython3.7m.so (found version "3.7.13") 
-- Found PythonInterp: /home/atif/.conda/envs/aihwkit/bin/python (found suitable version "3.7.13", minimum required is "3.6") 
-- Found PythonLibs: /home/atif/.conda/envs/aihwkit/lib/libpython3.7m.so
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- Found pybind11: /home/atif/.local/lib/python3.7/site-packages/pybind11/include (found version "2.10.0")
-- Found Python: /home/atif/.conda/envs/aihwkit/bin/python (found version "3.7.13") found components: Interpreter 
-- Found Torch: /home/atif/.conda/envs/aihwkit/lib/python3.7/site-packages/torch/include;/home/atif/.conda/envs/aihwkit/lib/python3.7/site-packages/torch/include/torch/csrc/api/include  
-- Set _GLIBCXX_USE_CXX11_ABI=0
-- The CUDA compiler identification is unknown
-- Detecting CUDA compiler ABI info
CMake Error in /home/atif/aihwkit/_skbuild/linux-x86_64-3.7/cmake-build/CMakeFiles/CMakeTmp/CMakeLists.txt:
  CUDA_ARCHITECTURES is empty for target "cmTC_93b3f".

CMake Error in /home/atif/aihwkit/_skbuild/linux-x86_64-3.7/cmake-build/CMakeFiles/CMakeTmp/CMakeLists.txt:
  CUDA_ARCHITECTURES is empty for target "cmTC_93b3f".

CMake Error at /home/atif/.conda/envs/aihwkit/share/cmake-3.22/Modules/CMakeDetermineCompilerABI.cmake:49 (try_compile):
  Failed to generate test project build system.
Call Stack (most recent call first):
  /home/atif/.conda/envs/aihwkit/share/cmake-3.22/Modules/CMakeTestCUDACompiler.cmake:19 (CMAKE_DETERMINE_COMPILER_ABI)
  cmake/dependencies_cuda.cmake:15 (enable_language)
  CMakeLists.txt:40 (include)

-- Configuring incomplete, errors occurred!
See also "/home/atif/aihwkit/_skbuild/linux-x86_64-3.7/cmake-build/CMakeFiles/CMakeOutput.log".
See also "/home/atif/aihwkit/_skbuild/linux-x86_64-3.7/cmake-build/CMakeFiles/CMakeError.log".
Traceback (most recent call last):
  File "/home/atif/.conda/envs/aihwkit/lib/python3.7/site-packages/skbuild/setuptools_wrap.py", line 642, in setup
    languages=cmake_languages,
  File "/home/atif/.conda/envs/aihwkit/lib/python3.7/site-packages/skbuild/cmaker.py", line 339, in configure
    os.path.abspath(CMAKE_BUILD_DIR()),

An error occurred while configuring with CMake.
  Command:
    cmake /home/atif/aihwkit -G 'Unix Makefiles' -DCMAKE_INSTALL_PREFIX:PATH=/home/atif/aihwkit/_skbuild/linux-x86_64-3.7/cmake-install -DPYTHON_VERSION_STRING:STRING=3.7.13 -DSKBUILD:INTERNAL=TRUE -DCMAKE_MODULE_PATH:PATH=/home/atif/.conda/envs/aihwkit/lib/python3.7/site-packages/skbuild/resources/cmake -DPython3_EXECUTABLE:FILEPATH=/home/atif/.conda/envs/aihwkit/bin/python -DPython3_INCLUDE_DIR:PATH=/home/atif/.conda/envs/aihwkit/include/python3.7m -DPython3_LIBRARY:PATH=/home/atif/.conda/envs/aihwkit/lib/libpython3.7m.so -DPython3_NumPy_INCLUDE_DIRS:PATH=/home/atif/.conda/envs/aihwkit/lib/python3.7/site-packages/numpy/core/include -DPython_EXECUTABLE:FILEPATH=/home/atif/.conda/envs/aihwkit/bin/python -DPython_INCLUDE_DIR:PATH=/home/atif/.conda/envs/aihwkit/include/python3.7m -DPython_LIBRARY:PATH=/home/atif/.conda/envs/aihwkit/lib/libpython3.7m.so -DPython_NumPy_INCLUDE_DIRS:PATH=/home/atif/.conda/envs/aihwkit/lib/python3.7/site-packages/numpy/core/include -DPYTHON_EXECUTABLE:FILEPATH=/home/atif/.conda/envs/aihwkit/bin/python -DPYTHON_INCLUDE_DIR:PATH=/home/atif/.conda/envs/aihwkit/include/python3.7m -DPYTHON_LIBRARY:PATH=/home/atif/.conda/envs/aihwkit/lib/libpython3.7m.so -DPYTHON_NumPy_INCLUDE_DIRS:PATH=/home/atif/.conda/envs/aihwkit/lib/python3.7/site-packages/numpy/core/include -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DRPU_BLAS=MKL -DINTEL_MKL_DIR= -DUSE_CUDA=ON -DRPU_CUDA_ARCHITECTURES=84 -DCUDA_ARCHITECTURES=86 -DCMAKE_BUILD_TYPE:STRING=Release
  Source directory:
    /home/atif/aihwkit
  Working directory:
    /home/atif/aihwkit/_skbuild/linux-x86_64-3.7/cmake-build
Please see CMake's output for more information.
make[2]: *** [Makefile:22: build] Error 1
make[2]: Leaving directory '/home/atif/aihwkit'
make[1]: *** [Makefile:25: build_mkl] Error 2
make[1]: Leaving directory '/home/atif/aihwkit'
make: *** [Makefile:28: build_cuda] Error 2
maljoras commented 2 years ago

It cannot find your CUDA library installation: -- The CUDA compiler identification is unknown. You need to make sure that nvcc is found and that CUDA library is properly installed on your system. Take a look at the instructions

matifali commented 2 years ago

I can run nvcc its on path. Also the output from which nvcc is

/home/atif/.conda/envs/aihwkit/bin/nvcc

and from nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0
maljoras commented 2 years ago

11.7 is very new, it might have issues with that. In any case, your CUDA installation seems to have problems if it cannot find the version string in the cmake output above.

Also your compiler 7.5 is very old. Not sure whether CUDA 11.7 supports such an old compiler.

matifali commented 2 years ago

This was an issue with CUDA. I used the nvidia/cuda:11.7.0-devel-ubuntu22.04 docker image as base, and everything worked out. I suggest to provide a docker image with and without CUDA for a more straightforward setup. I can contribute.

maljoras commented 2 years ago

Hi @matifali , great that it worked out!

Indeed, we are in the process of providing conda pre-builds for several CUDA versions. However, if you could contribute docker images or describe an easy way how to install it using docker images that would be highly appreciated!

matifali commented 2 years ago

Thats a good news if pre-built images can be provided. Also,I would be happy to share my docker image source as a pull request. Thank you for your help.

matifali commented 2 years ago

@maljoras Please check #403

matifali commented 2 years ago

Closed after #403 is merged. You are encouraged to use the docker version for CUDA support.