Closed vincentmr closed 1 year ago
Hi @carterbox , I could help fixing the paths, but I'm curious if you think there is a way forward regarding the GPU architectures. I'm thinking using a multiple-output recipe, one for each arch. Would that work?
Yes, it would be helpful if you were to work on replacing any build prefix paths with the appropriate paths.
Issue
I tried installing Lightning-Kokkos (L-Kokkos) on top of Kokkos (with the CUDA backend). I'm using Perlmutter, which is a GPU cluster of NERSC. Using CUDA-12 (not officially supported by L-Kokkos), I met the following issues
- The compiler
x86_64-conda-linux-gnu-c++
is set as theKokkos_CXX_COMPILER
. On Perlmutter, the standard C++ compiler isCC
. One must modifylib/cmake/Kokkos/KokkosConfigCommon.cmake
andbin/kokkos_launch_compiler
accordingly.
x86_64-conda-linux-gnu-c++
is the name of the conda-forge provided compiler for x86. Which you can install via the gxx_linux-64
package. The cxx-compiler
package is a meta package that will match the platform of the environment. I only intended for downstream users to use the conda-forge provided compilers and this package to compile more conda-forge packages, not to provide kokkos as a build tool for general use.
We could probably use $ENV{CXX}
instead for greater compatibility? AFAIK, $CC
is usually an environment variable for the c compiler not the c++ compiler.
- The targets
CUDA::cudart
,CUDA::cuda_driver
have unpatchedINTERFACE_INCLUDE_DIRECTORIES
still pointing to/home/conda/feedstock_root/build_artifacts/kokkos_1687386826792/_build_env/targets/x86_64-linux/include
.
In which file is this path? Maybe this is something that needs to be addressed upstream? CMake already has modules to find these libraries, so I'm not sure whey they need to hardcode the locations in this package.
Installing Kokkos from source to go through, trying to execute on Perlmutter's A100 GPUs, I get errors like
Kokkos::Cuda::initialize ERROR: running kernels compiled for compute capability 7.0 on device with compute capability 8.0 is not supported by CUDA!
unless compiling with
-DKokkos_ARCH_AMPERE80=ON
. It is not possible to target multiple GPU architectures while building Kokkos. So we should either target something recent, likeAMPERE80
, or build multiple targets by building different libs.
The reason that I decided it was OK to ship Kokkos with CUDA enabled is that I realized that we can set the compile options to include PTX with the kokkos shared objects. Thus by compiling for the lowest compute capability (35), the libraries should be compatible with any later devices via JIT compilation by the CUDA driver. This is not best for performance, but since Conda doesn't track CUDA archs, we must build for compatibility. Also, Kokkos doesn't allow targeting more than one CUDA arch because of compile time optimizations.
I'm not sure what options you used, but they are not the same because the conda-forge package is compiled for 35 or 50 depending on CUDA version. I suspect that by default Kokkos does not include PTX.
I'm thinking using a multiple-output recipe, one for each arch. Would that work?
No. Conda doesn't track CUDA archs.
In which file is this path? Maybe this is something that needs to be addressed upstream? CMake already has modules to find these libraries, so I'm not sure whey they need to hardcode the locations in this package.
I just checked the cmake files in $PREFIX/lib/cmake/Kokkos for kokkos 4.0.01 h1e7fabd_1. They do not mention the build_artifacts prefix anywhere.
Also, looking again at the KokkosConfig.cmake file, it seems that if you use find(CUDAToolkit)
(and it has success) in your downstream project, the following blocks are skipped:
IF(NOT TARGET CUDA::cudart)
ADD_LIBRARY(CUDA::cudart UNKNOWN IMPORTED)
SET_TARGET_PROPERTIES(CUDA::cudart PROPERTIES
IMPORTED_LOCATION "/usr/local/cuda/lib64/libcudart.so"
INTERFACE_INCLUDE_DIRECTORIES "/usr/local/cuda/include"
)
ENDIF()
IF(NOT TARGET CUDA::cuda_driver)
ADD_LIBRARY(CUDA::cuda_driver UNKNOWN IMPORTED)
SET_TARGET_PROPERTIES(CUDA::cuda_driver PROPERTIES
IMPORTED_LOCATION "/usr/lib64/libcuda.so"
INTERFACE_INCLUDE_DIRECTORIES "/usr/local/cuda/include"
)
ENDIF()
So I retract my suggestion that something needs to be addressed upstream because the Kokkos developers have given us a way in which to avoid these hardcoded paths.
Yes, it would be helpful if you were to work on replacing any build prefix paths with the appropriate paths.
Issue
I tried installing Lightning-Kokkos (L-Kokkos) on top of Kokkos (with the CUDA backend). I'm using Perlmutter, which is a GPU cluster of NERSC. Using CUDA-12 (not officially supported by L-Kokkos), I met the following issues
- The compiler
x86_64-conda-linux-gnu-c++
is set as theKokkos_CXX_COMPILER
. On Perlmutter, the standard C++ compiler isCC
. One must modifylib/cmake/Kokkos/KokkosConfigCommon.cmake
andbin/kokkos_launch_compiler
accordingly.
x86_64-conda-linux-gnu-c++
is the name of the conda-forge provided compiler for x86. Which you can install via thegxx_linux-64
package. Thecxx-compiler
package is a meta package that will match the platform of the environment. I only intended for downstream users to use the conda-forge provided compilers and this package to compile more conda-forge packages, not to provide kokkos as a build tool for general use.We could probably use
$ENV{CXX}
instead for greater compatibility? AFAIK,$CC
is usually an environment variable for the c compiler not the c++ compiler.
Got it. Let me try this out. Also, fyi, Perlmutter's compiler wrappers are awkwardly called ftn
, cc
and CC
for Fortran, C and C++ respectively. So one has to type a lot of CC=cc CXX=CC cmake -B build
-like commands.
- The targets
CUDA::cudart
,CUDA::cuda_driver
have unpatchedINTERFACE_INCLUDE_DIRECTORIES
still pointing to/home/conda/feedstock_root/build_artifacts/kokkos_1687386826792/_build_env/targets/x86_64-linux/include
.In which file is this path? Maybe this is something that needs to be addressed upstream? CMake already has modules to find these libraries, so I'm not sure whey they need to hardcode the locations in this package.
This is in cuda12/lib/cmake/Kokkos/KokkosConfig.cmake
.
Installing Kokkos from source to go through, trying to execute on Perlmutter's A100 GPUs, I get errors like
Kokkos::Cuda::initialize ERROR: running kernels compiled for compute capability 7.0 on device with compute capability 8.0 is not supported by CUDA!
unless compiling with
-DKokkos_ARCH_AMPERE80=ON
. It is not possible to target multiple GPU architectures while building Kokkos. So we should either target something recent, likeAMPERE80
, or build multiple targets by building different libs.The reason that I decided it was OK to ship Kokkos with CUDA enabled is that I realized that we can set the compile options to include PTX with the kokkos shared objects. Thus by compiling for the lowest compute capability (35), the libraries should be compatible with any later devices via JIT compilation by the CUDA driver. This is not best for performance, but since Conda doesn't track CUDA archs, we must build for compatibility. Also, Kokkos doesn't allow targeting more than one CUDA arch because of compile time optimizations.
I'll check whether we can do something with PTX.
This is in cuda12/lib/cmake/Kokkos/KokkosConfig.cmake.
Oh! These unpatched build prefixes are only for the CUDA 12 package. We should probably still patch them or replace this target with an error telling the user to use find(CUDA_Toolkit)
I believe that PTX forward compatability may only be available for kokkos 4, so I will probably have to pull the CUDA builds for 3.x.
https://github.com/kokkos/kokkos/issues/5439 https://github.com/kokkos/kokkos/issues/3612
I think that's right. I found this PR, which removes the blocking condition. I don't know enough about CUDA and Kokkos to know whether Kokkos really supports this forward compatibility in the end.
With CUDA-12, trying to find(CUDA_Toolkit)
, I have the following issue
CMake Warning at CMakeLists.txt:109 (find_package):
By not providing "FindCUDA_Toolkit.cmake" in CMAKE_MODULE_PATH this project
has asked CMake to find a package configuration file provided by
"CUDA_Toolkit", but CMake did not find one.
Could not find a package configuration file provided by "CUDA_Toolkit" with
any of the following names:
CUDA_ToolkitConfig.cmake
cuda_toolkit-config.cmake
which can be resolved with cp libcudacxx-config.cmake CUDA_ToolkitConfig.cmake
. Should we create a symlink so that the file has the name expected by CMake?
Then
-- Found libcudacxx: /global/homes/v/vincentm/mambaforge/envs/cuda12/targets/x86_64-linux/lib/cmake/libcudacxx/CUDA_ToolkitConfig.cmake
-- Found existing Kokkos libraries
-- pybind11 v2.10.1
-- Configuring done (0.7s)
CMake Error in pennylane_lightning_kokkos/src/simulator/CMakeLists.txt:
Imported target "Kokkos::kokkos" includes non-existent path
"/home/conda/feedstock_root/build_artifacts/kokkos_1687972725614/_build_env/targets/x86_64-linux/include"
in its INTERFACE_INCLUDE_DIRECTORIES.
which is the problem described above. I have the following hits
# grep -r kokkos_1687972725614 .
./lib/cmake/Kokkos/KokkosConfig.cmake:INTERFACE_INCLUDE_DIRECTORIES "/home/conda/feedstock_root/build_artifacts/kokkos_1687972725614/_build_env/targets/x86_64-linux/include"
./lib/cmake/Kokkos/KokkosConfig.cmake:INTERFACE_INCLUDE_DIRECTORIES "/home/conda/feedstock_root/build_artifacts/kokkos_1687972725614/_build_env/targets/x86_64-linux/include"
After fixing them it compiles. Since we are already patching the lib paths, I think we should patch the include paths as well.
CUDAToolkit not CUDA_Toolkit, and CMake 3.17 or later.
Thanks @carterbox for pointing this out. I'm using
> cmake --version
cmake version 3.26.4
CMake finds CUDAToolkit now. So one needs to insert find_package(CUDAToolkit)
before find_package(Kokkos)
. Should we then remove the sed
patch altogether, or still patch the includes too?
I also get the following runtime error
E RuntimeError: Kokkos::Impl::ParallelReduce< Cuda > requested too much L0 scratch memory
when trying to do anything. This is unrelated, but I was wondering whether you could successfully run some test?
when trying to do anything. This is unrelated, but I was wondering whether you could successfully run some test?
I tried building and running the project in the examples (build_cmake_installed). First, I created the following environment:
mamba create -n kokkos cxx-compiler cuda-compiler fortran-compiler cmake ninja kokkos=4
Then I edited the example to include find(CUDAToolkit)
before finding kokkos.
Finally, I configured the and built the project using cmake and ninja.
Running the example works.
Did the same with the CUDA 11.2 build of Kokkos 4 by creating the following environment:
mamba create -n kokkos2 cxx-compiler gxx=10 kokkos=4 cuda-version=11.2 cmake ninja fortran-compiler
Also works. I do get warning messages about performance because the conda-forge packages were built for SM35 and SM50, but my device is SM61.
Should we then remove the sed patch altogether, or still patch the includes too?
I think we should replace the hardcoded paths with an error that the user should use find(CUDAToolkit). Something like:
IF(NOT TARGET CUDA::cudart)
MESSAGE(FATAL_ERROR,"The CUDA::cudart target was not found; use find_package(CUDAToolkit REQUIRED) before find_package(Kokkos).")
ENDIF()
IF(NOT TARGET CUDA::cuda_driver)
MESSAGE(FATAL_ERROR, "The CUDA::cuda_driver target was not found; use find_package(CUDAToolkit REQUIRED) before find_package(Kokkos).")
ENDIF()
It's probably easier than trying to guess about the end user's conda environment.
Running the example works.
Good. I guess there is an issue with my examples (too large maybe?).
I think we should replace the hardcoded paths with an error that the user should use find(CUDAToolkit). Something like:
I like that solution. I think we can close this and implement your fix.
Solution to issue cannot be found in the documentation.
Issue
I tried installing Lightning-Kokkos (L-Kokkos) on top of Kokkos (with the CUDA backend). I'm using Perlmutter, which is a GPU cluster of NERSC. Using CUDA-12 (not officially supported by L-Kokkos), I met the following issues
x86_64-conda-linux-gnu-c++
is set as theKokkos_CXX_COMPILER
. On Perlmutter, the standard C++ compiler isCC
. One must modifylib/cmake/Kokkos/KokkosConfigCommon.cmake
andbin/kokkos_launch_compiler
accordingly.CUDA::cudart
,CUDA::cuda_driver
have unpatchedINTERFACE_INCLUDE_DIRECTORIES
still pointing to/home/conda/feedstock_root/build_artifacts/kokkos_1687386826792/_build_env/targets/x86_64-linux/include
.Using CUDA-11, I met the following issues
x86_64-conda-linux-gnu-c++
is set as theKokkos_CXX_COMPILER
. On Perlmutter, the standard C++ compiler isCC
. One must modifylib/cmake/Kokkos/KokkosConfigCommon.cmake
andbin/kokkos_launch_compiler
accordingly.CUDA::cuda_driver
hasINTERFACE_INCLUDE_DIRECTORIES "/usr/local/cuda/include"
which should be patched to point the the env's include files.After these fix, everything can compile, but I get an error
Installing Kokkos from source to go through, trying to execute on Perlmutter's A100 GPUs, I get errors like
unless compiling with
-DKokkos_ARCH_AMPERE80=ON
. It is not possible to target multiple GPU architectures while building Kokkos. So we should either target something recent, likeAMPERE80
, or build multiple targets by building different libs.Installed packages
Environment info