ECP-copa / Cabana

Performance-portable library for particle-based simulations
Other
188 stars 51 forks source link

Cabana installation issue #741

Closed dineshadepu closed 1 month ago

dineshadepu commented 1 month ago

Hi all,

I am reinstalling Cabana, with all the latest, including the updated Kokkos and Cabana. I was able to install kokkos successfully. However, I am encountering some issues with Cabana. Can you please help me out with this issue. Here are the details.

image

I have installed the latest cuda version, 12.4.

|  HPC LAB UoS => nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:19:38_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0

The cmake command I used for is

|  HPC LAB UoS => cmake         -D CMAKE_BUILD_TYPE="Release"      -D CMAKE_PREFIX_PATH="$KOKKOS_INSTALL_DIR;$SILO_INSTALL_DIR;$SILO_INCLUDE_DIR;$SILO_LIBRARY;$HDF_INSTALL_DIR"     -D CMAKE_INSTALL_PREFIX=$CABANA_INSTALL_DIR      -D CMAKE_CXX_COMPILER=$KOKKOS_SRC_DIR/bin/nvcc_wrapper      -D Cabana_REQUIRE_CUDA=ON      -D Cabana_REQUIRE_MPI=ON   -D Cabana_ENABLE_EXAMPLES=ON      -D Cabana_ENABLE_TESTING=OFF      -D Cabana_REQUIRE_SILO=ON      -D Cabana_REQUIRE_HDF5=ON      -D Cabana_ENABLE_PERFORMANCE_TESTING=OFF      .. ;
-- The CXX compiler identification is GNU 11.4.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /home/adepudinesh/post_doc/softwares/kokkos/bin/nvcc_wrapper - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5")  
-- Enabled Kokkos devices: OPENMP;CUDA;SERIAL
CMake Warning at /home/adepudinesh/post_doc/softwares/kokkos/build/install/lib/cmake/Kokkos/KokkosConfigCommon.cmake:59 (MESSAGE):
  The installed Kokkos configuration does not support CXX extensions.
  Forcing -DCMAKE_CXX_EXTENSIONS=Off
Call Stack (most recent call first):
  /home/adepudinesh/post_doc/softwares/kokkos/build/install/lib/cmake/Kokkos/KokkosConfig.cmake:57 (INCLUDE)
  CMakeLists.txt:39 (find_package)

-- Found Kokkos_DEVICES: CUDA  
-- Found Kokkos_OPTIONS: CUDA_LAMBDA  
-- Found MPI_CXX: /usr/lib/x86_64-linux-gnu/libmpichcxx.so (found version "4.0") 
-- Found MPI: TRUE (found version "4.0")  
-- Could NOT find CLANG_FORMAT: Found unsuitable version "0.0", but required is at least "14" (found CLANG_FORMAT_EXECUTABLE-NOTFOUND)
-- Found SILO: /home/adepudinesh/post_doc/softwares/Silo/include  
-- The C compiler identification is GNU 12.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Found HDF5: /home/dinesh/post_doc/softwares/hdf5/install/lib/libhdf5.so;/usr/lib/x86_64-linux-gnu/libz.so;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so (found version "1.14.3") found components: C 
-- Performing Test COMPILER_SUPPORTS_MARCH
-- Performing Test COMPILER_SUPPORTS_MARCH - Success
-- Found Git: /usr/bin/git (found version "2.34.1") 
-- Cabana Revision = '9a1ad6050d1f51ff1590e2284a7b1223677a2f32'
-- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE) 
-- Configuring done
-- Generating done
-- Build files have been written to: /home/adepudinesh/post_doc/softwares/Cabana/build

After running make install, I get the following output after some compilation

|  HPC LAB UoS => make install
[  1%] Building CXX object example/core_tutorial/01_hello_world/CMakeFiles/HelloWorld.dir/hello_world.cpp.o
[  3%] Linking CXX executable HelloWorld
[  3%] Built target HelloWorld
[  5%] Building CXX object example/core_tutorial/02_tuple/CMakeFiles/Tuple.dir/tuple_example.cpp.o
[  6%] Linking CXX executable Tuple
[  6%] Built target Tuple
[  8%] Building CXX object example/core_tutorial/03_struct_of_arrays/CMakeFiles/StructOfArrays.dir/soa_example.cpp.o
[ 10%] Linking CXX executable StructOfArrays
[ 10%] Built target StructOfArrays
[ 11%] Building CXX object example/core_tutorial/04_aosoa_advanced_unmanaged/CMakeFiles/AdvancedUnmanagedAoSoA.dir/advanced_aosoa_unmanaged.cpp.o
[ 13%] Linking CXX executable AdvancedUnmanagedAoSoA
[ 13%] Built target AdvancedUnmanagedAoSoA
[ 15%] Building CXX object example/core_tutorial/04_aosoa/CMakeFiles/ArrayOfStructsOfArrays.dir/aosoa_example.cpp.o
[ 16%] Linking CXX executable ArrayOfStructsOfArrays
[ 16%] Built target ArrayOfStructsOfArrays
[ 18%] Building CXX object example/core_tutorial/05_slice/CMakeFiles/Slice.dir/slice_example.cpp.o
[ 20%] Linking CXX executable Slice
[ 20%] Built target Slice
[ 21%] Building CXX object example/core_tutorial/06_deep_copy/CMakeFiles/DeepCopy.dir/deep_copy_example.cpp.o
[ 23%] Linking CXX executable DeepCopy
lto-wrapper: warning: using serial compilation of 4 LTRANS jobs
/usr/bin/ld: /tmp/cclOPrYc.ltrans3.ltrans.o:(.nvFatBinSegment+0x8): undefined reference to `fatbinData'
collect2: error: ld returned 1 exit status
make[2]: *** [example/core_tutorial/06_deep_copy/CMakeFiles/DeepCopy.dir/build.make:110: example/core_tutorial/06_deep_copy/DeepCopy] Error 1
make[1]: *** [CMakeFiles/Makefile2:887: example/core_tutorial/06_deep_copy/CMakeFiles/DeepCopy.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

Many thanks :)

dineshadepu commented 1 month ago

Update

The installation works with the CPU build, giving the same warning.

|  HPC LAB UoS => make install
[  1%] Building CXX object example/core_tutorial/01_hello_world/CMakeFiles/HelloWorld.dir/hello_world.cpp.o
[  3%] Linking CXX executable HelloWorld
[  3%] Built target HelloWorld
[  5%] Building CXX object example/core_tutorial/02_tuple/CMakeFiles/Tuple.dir/tuple_example.cpp.o
[  6%] Linking CXX executable Tuple
[  6%] Built target Tuple
[  8%] Building CXX object example/core_tutorial/03_struct_of_arrays/CMakeFiles/StructOfArrays.dir/soa_example.cpp.o
[ 10%] Linking CXX executable StructOfArrays
[ 10%] Built target StructOfArrays
[ 12%] Building CXX object example/core_tutorial/04_aosoa_advanced_unmanaged/CMakeFiles/AdvancedUnmanagedAoSoA.dir/advanced_aosoa_unmanaged.cpp.o
[ 13%] Linking CXX executable AdvancedUnmanagedAoSoA
[ 13%] Built target AdvancedUnmanagedAoSoA
[ 15%] Building CXX object example/core_tutorial/04_aosoa/CMakeFiles/ArrayOfStructsOfArrays.dir/aosoa_example.cpp.o
[ 17%] Linking CXX executable ArrayOfStructsOfArrays
[ 17%] Built target ArrayOfStructsOfArrays
[ 18%] Building CXX object example/core_tutorial/05_slice/CMakeFiles/Slice.dir/slice_example.cpp.o
[ 20%] Linking CXX executable Slice
[ 20%] Built target Slice
[ 22%] Building CXX object example/core_tutorial/06_deep_copy/CMakeFiles/DeepCopy.dir/deep_copy_example.cpp.o
[ 24%] Linking CXX executable DeepCopy
lto-wrapper: warning: using serial compilation of 2 LTRANS jobs
[ 24%] Built target DeepCopy
[ 25%] Building CXX object example/core_tutorial/07_sorting/CMakeFiles/Sorting.dir/sorting_example.cpp.o
[ 27%] Linking CXX executable Sorting

The warning turns into an error for the GPU build

lto-wrapper: warning: using serial compilation of 2 LTRANS jobs

streeve commented 1 month ago

I'm not familiar with the error, but a quick search shows that maybe --disable-flto may help at configuration time. This is a more general compilation issue than Cabana, but let me know if you find anything out that we can do to help on our side.

dineshadepu commented 1 month ago

Thank you for the suggestion, @streeve . I have updated my nvidia drivers and cuda version and even my gcc, but somehow this error doesn't go away. I do not have much experience with cmakelists.txt. Can you please tell me how I can pass this flag to the current build process? Will be very helpful. I have tried few ways, but couldn't get it working.

dineshadepu commented 1 month ago

This bug seems to be occuring in lot of other places. I found a similar report here: https://gromacs.bioexcel.eu/t/error-installing-gromacs-2022-c-one-definition-rule/5322/10, where the user is facing similar problem.

The solution is as follows:

After running the typical cmake command to install cabana,

cmake         -D CMAKE_BUILD_TYPE="Release"      -D CMAKE_PREFIX_PATH="$KOKKOS_INSTALL_DIR;$SILO_INSTALL_DIR;$SILO_INCLUDE_DIR;$SILO_LIBRARY;$HDF_INSTALL_DIR"     -D CMAKE_INSTALL_PREFIX=$CABANA_INSTALL_DIR      -D CMAKE_CXX_COMPILER=$KOKKOS_SRC_DIR/bin/nvcc_wrapper      -D Cabana_REQUIRE_CUDA=ON      -D Cabana_REQUIRE_MPI=ON         -D Cabana_ENABLE_EXAMPLES=ON      -D Cabana_ENABLE_TESTING=OFF      -D Cabana_REQUIRE_SILO=ON      -D Cabana_REQUIRE_HDF5=ON      -D Cabana_ENABLE_PERFORMANCE_TESTING=OFF      -D Cabana_ENABLE_CAJITA=ON ..;

We should again run the following:

cmake .. -DMPI_CXX_COMPILE_OPTIONS=""

This will not result in any lto warning or errors

adepudinesh@UoSHPCLab (master *) /home/adepudinesh/post_doc/softwares/Cabana/build $  
|  HPC LAB UoS => make install
[  1%] Building CXX object example/core_tutorial/01_hello_world/CMakeFiles/HelloWorld.dir/hello_world.cpp.o
[  3%] Linking CXX executable HelloWorld
[  3%] Built target HelloWorld
[  5%] Building CXX object example/core_tutorial/02_tuple/CMakeFiles/Tuple.dir/tuple_example.cpp.o
[  6%] Linking CXX executable Tuple
[  6%] Built target Tuple
[  8%] Building CXX object example/core_tutorial/03_struct_of_arrays/CMakeFiles/StructOfArrays.dir/soa_example.cpp.o
[ 10%] Linking CXX executable StructOfArrays
[ 10%] Built target StructOfArrays
[ 11%] Building CXX object example/core_tutorial/04_aosoa_advanced_unmanaged/CMakeFiles/AdvancedUnmanagedAoSoA.dir/advanced_aosoa_unmanaged.cpp.o
[ 13%] Linking CXX executable AdvancedUnmanagedAoSoA
[ 13%] Built target AdvancedUnmanagedAoSoA
[ 15%] Building CXX object example/core_tutorial/04_aosoa/CMakeFiles/ArrayOfStructsOfArrays.dir/aosoa_example.cpp.o
[ 16%] Linking CXX executable ArrayOfStructsOfArrays
[ 16%] Built target ArrayOfStructsOfArrays
[ 18%] Building CXX object example/core_tutorial/05_slice/CMakeFiles/Slice.dir/slice_example.cpp.o
[ 20%] Linking CXX executable Slice
[ 20%] Built target Slice
[ 21%] Building CXX object example/core_tutorial/06_deep_copy/CMakeFiles/DeepCopy.dir/deep_copy_example.cpp.o
[ 23%] Linking CXX executable DeepCopy
[ 23%] Built target DeepCopy
[ 25%] Building CXX object example/core_tutorial/07_sorting/CMakeFiles/Sorting.dir/sorting_example.cpp.o
[ 26%] Linking CXX executable Sorting
[ 26%] Built target Sorting
[ 28%] Building CXX object example/core_tutorial/08_linked_cell_list/CMakeFiles/LinkedCellList.dir/linked_cell_list_example.cpp.o
[ 30%] Linking CXX executable LinkedCellList
[ 30%] Built target LinkedCellList

Glad I am able to fix it.

Thank you very much. :) Closing it now.